Bug #1717
Python segfault during GFilename.is_fits() using OpenMP
Status: | Closed | Start date: | 02/28/2016 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assigned To: | Knödlseder Jürgen | % Done: | 100% | |
Category: | - | |||
Target version: | 1.1.0 | |||
Duration: |
Description
I have a problem on the Zeuthen batch farm (Scientific Linux 6.6, Python 2.6.6, cfitsio: 3.340):
Running ctlike
stops with a segmentation fault from time to time. There is no rule when it might happens:
$ ctlike debug=yes Input event list, counts cube or observation definition XML file [myinobs.xml] Input model XML file [inmodel.xml] Output model XML file [outmodel.xml] ... 2016-02-28T10:58:14: +=================================+ 2016-02-28T10:58:14: | Maximum likelihood optimisation | 2016-02-28T10:58:14: +=================================+ [1] 106012 segmentation fault ctlike debug=yes
I have tested it with HESS data and also with simulated CTA data (running
csobsdef
, ctobssim
, ctselect
, ctlike
in a row). The problem occurs in both cases.
Here is the backtrace from gdb
:
(gdb) backtrace #0 0x0000003e8d867404 in fread () from /lib64/libc.so.6 #1 0x00007ffff731fcb3 in file_read () from /afs/ifh.de/group/hess/scratch/software/stable/cfitsio/sl6/cfitsio-3.340-gcc-4.8.1/lib/libcfitsio.so #2 0x00007ffff731a474 in ffread () from /afs/ifh.de/group/hess/scratch/software/stable/cfitsio/sl6/cfitsio-3.340-gcc-4.8.1/lib/libcfitsio.so #3 0x00007ffff73117ef in ffldrc () from /afs/ifh.de/group/hess/scratch/software/stable/cfitsio/sl6/cfitsio-3.340-gcc-4.8.1/lib/libcfitsio.so #4 0x00007ffff731dd07 in ffopen () from /afs/ifh.de/group/hess/scratch/software/stable/cfitsio/sl6/cfitsio-3.340-gcc-4.8.1/lib/libcfitsio.so #5 0x00007ffff7ae44b1 in GFilename::is_fits() const () at GFilename.cpp:267 #6 0x00007ffff7ae4551 in GFilename::exists() const () at GFilename.cpp:226 #7 0x00007ffff7c7641b in GCTAEventList::fetch() const () at src/GCTAEventList.cpp:646 #8 0x00007ffff7c77029 in GCTAEventList::operator[](int const&) const () at src/GCTAEventList.cpp:215 #9 0x00007ffff7bbcaa9 in GObservation::likelihood_poisson_unbinned(GModels const&, GVector*, GMatrixSparse*, double*) const () at GObservation.cpp:924 #10 0x00007ffff7bbafee in GObservation::likelihood(GModels const&, GVector*, GMatrixSparse*, double*) const () at GObservation.cpp:197 #11 0x00007ffff7bb991b in GObservations::likelihood::eval () at GObservations_likelihood.cpp:270 #12 0x00007ffff70e835a in gomp_thread_start () at ../../../gcc-4.8.1/libgomp/team.c:115 #13 0x0000003e8e407aa1 in start_thread () from /lib64/libpthread.so.0 #14 0x0000003e8d8e893d in clone () from /lib64/libc.so.6
So there seems to be some problem with the file I/O access. I don’t see this problem on my Mac where I don’t have OpenMp.
Therefore, in my view, it could be a problem with parallel access to some FITS files?
I have also tried to compile gammalib and ctools without OpenMP support (using the configure option --disable-openmp
).
Then everything works smoothly which supports the above hypothesis.
I also have the feeling that the larger the observation container, the more likely the code will fail. Could someone reproduce this error on a system that supports OpenMP?
Here is the command sequence:
$ csobsdef Input pointing definition file [pnt.dat] Output observation definition XML file [obs.xml] Pointing duration (seconds) [1800.0] $ ctobssim inobs=obs.xml Calibration database [prod2] Instrument response function [South_50h] Lower energy limit (TeV) [0.5] Upper energy limit (TeV) [50] Radius of FOV (degrees) (0-180) [3.0] Input model XML file [$CTOOLS/share/models/crab.xml] Output event data file or observation definition XML file [sim_events.xml] $ ctselect usepnt=yes usethres=DEFAULT Input event list or observation definition XML file [sim_events.xml] Radius of ROI (degrees) (0-180) [2.5] Start time (CTA MET in seconds) [INDEF] Lower energy limit (TeV) [0.6] Upper energy limit (TeV) [20.0] Output event list or observation definition XML file [sel_obs.xml] $ ctlike Input event list, counts cube or observation definition XML file [sel_obs.xml] Input model XML file [$CTOOLS/share/models/crab.xml] Output model XML file [optmodel.xml]
The input file pnt.dat is attached.
Recurrence
No recurrence.
History
#1 Updated by Knödlseder Jürgen almost 9 years ago
- Status changed from New to Feedback
- Assigned To set to Knödlseder Jürgen
- Target version set to 1.1.0
- % Done changed from 0 to 100
I put the relevant code in a critical OMP zone which should make this thread safe.
Code is in devel
.
Can you check if this solves your problem?
#2 Updated by Mayer Michael almost 9 years ago
Thanks for the quick feedback. I have tested the new code and everything works fine now.
#3 Updated by Knödlseder Jürgen almost 9 years ago
- Status changed from Feedback to Closed