Bug #1717

Python segfault during GFilename.is_fits() using OpenMP

Added by Mayer Michael about 8 years ago. Updated about 8 years ago.

Status:ClosedStart date:02/28/2016
Priority:NormalDue date:
Assigned To:Knödlseder Jürgen% Done:

100%

Category:-
Target version:1.1.0
Duration:

Description

I have a problem on the Zeuthen batch farm (Scientific Linux 6.6, Python 2.6.6, cfitsio: 3.340):
Running ctlike stops with a segmentation fault from time to time. There is no rule when it might happens:

$ ctlike debug=yes
Input event list, counts cube or observation definition XML file [myinobs.xml]
Input model XML file [inmodel.xml]
Output model XML file [outmodel.xml]
...
2016-02-28T10:58:14: +=================================+
2016-02-28T10:58:14: | Maximum likelihood optimisation |
2016-02-28T10:58:14: +=================================+
[1]    106012 segmentation fault  ctlike debug=yes

I have tested it with HESS data and also with simulated CTA data (running csobsdef, ctobssim, ctselect, ctlike in a row). The problem occurs in both cases.

Here is the backtrace from gdb:

(gdb) backtrace
#0  0x0000003e8d867404 in fread () from /lib64/libc.so.6
#1  0x00007ffff731fcb3 in file_read ()
   from /afs/ifh.de/group/hess/scratch/software/stable/cfitsio/sl6/cfitsio-3.340-gcc-4.8.1/lib/libcfitsio.so
#2  0x00007ffff731a474 in ffread ()
   from /afs/ifh.de/group/hess/scratch/software/stable/cfitsio/sl6/cfitsio-3.340-gcc-4.8.1/lib/libcfitsio.so
#3  0x00007ffff73117ef in ffldrc ()
   from /afs/ifh.de/group/hess/scratch/software/stable/cfitsio/sl6/cfitsio-3.340-gcc-4.8.1/lib/libcfitsio.so
#4  0x00007ffff731dd07 in ffopen ()
   from /afs/ifh.de/group/hess/scratch/software/stable/cfitsio/sl6/cfitsio-3.340-gcc-4.8.1/lib/libcfitsio.so
#5  0x00007ffff7ae44b1 in GFilename::is_fits() const () at GFilename.cpp:267
#6  0x00007ffff7ae4551 in GFilename::exists() const () at GFilename.cpp:226
#7  0x00007ffff7c7641b in GCTAEventList::fetch() const () at src/GCTAEventList.cpp:646
#8  0x00007ffff7c77029 in GCTAEventList::operator[](int const&) const () at src/GCTAEventList.cpp:215
#9  0x00007ffff7bbcaa9 in GObservation::likelihood_poisson_unbinned(GModels const&, GVector*, GMatrixSparse*, double*) const () at GObservation.cpp:924
#10 0x00007ffff7bbafee in GObservation::likelihood(GModels const&, GVector*, GMatrixSparse*, double*) const ()
    at GObservation.cpp:197
#11 0x00007ffff7bb991b in GObservations::likelihood::eval () at GObservations_likelihood.cpp:270
#12 0x00007ffff70e835a in gomp_thread_start () at ../../../gcc-4.8.1/libgomp/team.c:115
#13 0x0000003e8e407aa1 in start_thread () from /lib64/libpthread.so.0
#14 0x0000003e8d8e893d in clone () from /lib64/libc.so.6

So there seems to be some problem with the file I/O access. I don’t see this problem on my Mac where I don’t have OpenMp.
Therefore, in my view, it could be a problem with parallel access to some FITS files?

I have also tried to compile gammalib and ctools without OpenMP support (using the configure option --disable-openmp).
Then everything works smoothly which supports the above hypothesis.

I also have the feeling that the larger the observation container, the more likely the code will fail. Could someone reproduce this error on a system that supports OpenMP?
Here is the command sequence:

$ csobsdef
Input pointing definition file [pnt.dat] 
Output observation definition XML file [obs.xml] 
Pointing duration (seconds) [1800.0] 

$ ctobssim inobs=obs.xml
Calibration database [prod2] 
Instrument response function [South_50h] 
Lower energy limit (TeV) [0.5] 
Upper energy limit (TeV) [50] 
Radius of FOV (degrees) (0-180) [3.0] 
Input model XML file [$CTOOLS/share/models/crab.xml] 
Output event data file or observation definition XML file [sim_events.xml] 

$ ctselect usepnt=yes usethres=DEFAULT
Input event list or observation definition XML file [sim_events.xml] 
Radius of ROI (degrees) (0-180) [2.5] 
Start time (CTA MET in seconds) [INDEF] 
Lower energy limit (TeV) [0.6] 
Upper energy limit (TeV) [20.0] 
Output event list or observation definition XML file [sel_obs.xml] 

$ ctlike
Input event list, counts cube or observation definition XML file [sel_obs.xml] 
Input model XML file [$CTOOLS/share/models/crab.xml] 
Output model XML file [optmodel.xml] 

The input file pnt.dat is attached.

pnt.dat Magnifier (655 Bytes) Mayer Michael, 02/28/2016 12:33 PM


Recurrence

No recurrence.

History

#1 Updated by Knödlseder Jürgen about 8 years ago

  • Status changed from New to Feedback
  • Assigned To set to Knödlseder Jürgen
  • Target version set to 1.1.0
  • % Done changed from 0 to 100

I put the relevant code in a critical OMP zone which should make this thread safe.

Code is in devel.

Can you check if this solves your problem?

#2 Updated by Mayer Michael about 8 years ago

Thanks for the quick feedback. I have tested the new code and everything works fine now.

#3 Updated by Knödlseder Jürgen about 8 years ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF