Feature #565: Convert / store / query HESS data for ctools usage - GammaLib - CTA IRAP Project Gateway

Feature #565

Updated by Deil Christoph about 12 years ago

We want to make it possible to run analyses of HESS data with ctools.

Here we give an overview of how HESS data is currently handled in the HESS software, what we want for ctools and finally the concrete tasks required to make it happen.

Once the HESS data (this issue) and IRFs (issue #536) are available from gammalib / ctools, we can discuss tools and toolchains for actual HESS data analysis.

The HESS data is private to the HESS collaboration, still it makes sense to develop the formats / tools to make this possible partly in the gammalib / ctools repository. A very small subset of the HESS data has already been released as part of the first CTA data challenge (1DC), and it is planned to release further larger subsets for future CTA data challenges. An alternative to using real data to test the data handling / tool chains would be to simulate 1000s of HESS or CTA runs for a future data challenge.

h1. HESS data processing with findruns / HAP

Let me start by briefly describing how an end-user runs a HESS analysis in Heidelberg. I believe this info will be useful when discussing the formats and tools we want for gammalib / ctools.

* First a runlist is created for the target of interest, e.g.
<pre>
findruns.pl --name Crab --radius 3 --mintels 3 --selection spectral > crab_runs.lis
(actually there's a dozen options)
</pre>
This will query a MySQL database table with run quality information (and a table of common sources as a convenient alternative to giving sky position coordinates directly) and write an ascii file like
<pre>
# RunNumber TelescopePattern
23037 30
24099 14
</pre>
which specifies for each run which telescopes have good-quality data that should be used in analysis.
* Then the user runs an analysis, e.g.
<pre>
hap --runlist crab_runs.lis --config std --Background/Method ReflectedBg --Spectrum/Method Standard
(actually there's about 100 options)
</pre>
which for each run in the runlist reads one ROOT data file in @$HESSDST@ (a folder specified via an environment variable), where runs are stored according to a fixed scheme, namely in sub-folders each covering 200 runs, e.g. @run030000-030199@ with filenames such as @run_030011_DST_001.root@ .
Note that the DST input files for HAP contain shower images (after tail cuts) and Hillas parameters for all events so are quite large (a few 100 MB per run). HAP runs direction and energy reconstruction based on the telescope pattern from the runlist and then performs gamma-hadron separation based on the "config" chosen. Only then the high-level spectral or morphological analysis starts using the post-select event list and pre-computed lookups appropriate for this telescope pattern and config.
See issue #536 for a discussion on how to integrate the HESS lookups in gammalib / ctools.
HAP then creates an output ROOT file with a Stats, Spectrum or Map object from which end results can easily be obtained e.g. by fitting a model.

The database tables and files in $HESSDST are created and maintained by experts, the end-user simply copies and uses them.

h1. HESS data processing with gammalib / ctools

We want to define formats and implement tools to make it possible to run similar analyses of HESS data with ctools, with one very important difference: for now we want to create post-select FITS event lists for a given "telescope pattern" and "config" for each run with HAP and only run the high-level analysis with ctools (i.e. no shower image analysis, Hillas parameters, direction and energy reconstruction, gamma-hadron separation). Using the HD cluster I could within a day create six event list versions for each run for the common data quality choices ("spectral" and "detection") and selection cuts ("hard", "std" and "loose"). By only supporting these most common data quality and selection cuts, we also get huge savings for the end user in disk space and processing speed (because much fewer events have to be processed using fewer lookups) by sacrificing the flexibility to use your own data quality and gamma-hadron separation cuts (which is rarely ever done anyways because you also have to create custom lookups, processing about a terabyte of gamma simulations).

h1. Tasks

So actually converting the HESS data for ctools is relatively straight-forward, I think the work can be split in these subtasks:
A) Create FITS event lists for all good-quality HESS data for the common configs and distribute / store them in some well-defined form.
B) Create a run info table summarising what runs are available (locally and / or for download)
C) Create a ctfindruns tool (findruns equivalent) that queries the run info table for a given target, data quality selection and gamma-hadron selection cuts, creating a run or file list.

I will create sub-tasks to discuss the details (file format, location, content) of these three tasks.

Back