Change request #2096

ctlike is very slow when a mapcube is used in the xml model

Added by Di Venere Leonardo over 5 years ago. Updated over 5 years ago.

Status:NewStart date:05/02/2017
Priority:NormalDue date:
Assigned To:-% Done:

0%

Category:-
Target version:-
Duration:

Description

I tried to fit a binned event cube including a diffuse model as background (in addition to the IRF background) in the xml file. This model is described by a constant spectum and a DiffuseMapCube as spatial model. With this setup ctlike becomes very slow compared to the case in which the diffuse source is not in the model.
For comparison, I ran ctlike in debug mode and found out that when the diffuse mapcube is not in the model one iteration takes approximately 2-3 seconds, while when the source is included in the model, it takes 2-3 minutes.
This is valid regardless whether the diffuse source is fitted or fixed.
I guess this is expected for extended sources, due to the PSF convolution. However, for a DiffuseMapCube, the PSF convolution is not needed at each iteration step, since the spatial model never changes. The same issue is probably also true for the DiffuseMap case (but I haven’t tested it).

ctbin2.log (3.46 KB) Di Venere Leonardo, 05/23/2017 10:39 PM

ctobssim2.log (17.2 KB) Di Venere Leonardo, 05/23/2017 10:39 PM

LSI.xml Magnifier (1.46 KB) Di Venere Leonardo, 05/23/2017 10:39 PM

fit1.xml Magnifier (1.27 KB) Di Venere Leonardo, 05/23/2017 10:39 PM

fit2.xml Magnifier (1.15 KB) Di Venere Leonardo, 05/23/2017 10:39 PM

ctmapcube.log (4.77 KB) Di Venere Leonardo, 05/23/2017 10:40 PM


Recurrence

No recurrence.

History

#1 Updated by Di Venere Leonardo over 5 years ago

I looked a bit into the code and I found two possible reasons (maybe they are not the only ones):
  1. during the fit the diffuse source is loaded multiple times (at least once at each iteration) by the method GModelSpatialDiffuseCube::load(), lines 1179 and 1182
  2. I don’t completely understand whether the diffuse map is convolved with the PSF only once (at the beginning of the fit procedure) or the convolution is recomputed at each iteration.
By running the code in debug mode (using gdb), I found out that probably the first reason is the most time consuming, especially when the diffuse mapcube is stored in a big file. In fact, I reduced the file dimension using ctmapcube to generate a diffuse model file restricted to my ROI. This improves the computation time necessary for this method:
  • with a full sky model map: 5 minutes per iteration
  • with reduced map (using ctmapcube): 2-3 minutes per iteration (this is the number I put in the description above)

I don’t know if this is actually related to some memory issues on the machine I’m working on, which could be for some reason slow in loading these big files. I’ll test the same thing on the cluster as soon as I can.
Anyway, I think that there should be a way to keep in memory the loaded and convolved maps in order to avoid reading the files and remake the convolution at each iteration.
Furthermore, I don’t know if it might be useful to store this convolved map in a separate file, similarly to what gtsrcmaps does in Fermi tools. This might be useful if one has to run multiple fits (or for example in tools like cttsmap or cssens) on the same region and/or may want to generate a model map. Nevertheless, I guess that there will not be many diffuse sources in the model for a single ROI, so probably the time saved would be just a few minutes per run.

Finally, a similar issue is valid for sources that are kept fixed in the model, since the computation time of each iteration does not change when I fix the diffuse source. However, for fixed sources (not only diffuse sources) the model map could be calculated only once and stored before starting the fit.

I don’t know how much work these improvements would require and if they are worth the effort indeed.

#2 Updated by Di Venere Leonardo over 5 years ago

Di Venere Leonardo wrote:

I don’t know if this is actually related to some memory issues on the machine I’m working on, which could be for some reason slow in loading these big files. I’ll test the same thing on the cluster as soon as I can.

I tested the same code on the cluster of my home institution. The issue seems to be the same. I tested two mapcube files: the first with dimension 400x400x32 (approx. 40 MB) and the second with dimension 7200x481x32 (approx. 800MB after gzip compression).
  • without mapcube: 2 seconds / iteration
  • with the mapcube of 40MB: 2-3 minutes / iteration
  • with the mapcube of 800MB: 3-4 minutes / iteration

The fact that the computation time doen’t increase much with the increasing number of pixels suggests that probably the actual computation of the model is not too much time consuming, while most of the time might be spent on other operations, such as I/O operations as suggested in my previous comment.

#3 Updated by Knödlseder Jürgen over 5 years ago

Di Venere Leonardo wrote:

I looked a bit into the code and I found two possible reasons (maybe they are not the only ones):
  1. during the fit the diffuse source is loaded multiple times (at least once at each iteration) by the method GModelSpatialDiffuseCube::load(), lines 1179 and 1182
  2. I don’t completely understand whether the diffuse map is convolved with the PSF only once (at the beginning of the fit procedure) or the convolution is recomputed at each iteration.

Normally the file should be loaded only once. The GModelSpatialDiffuseCube::load_cube method is called by the GModelSpatialDiffuseCube::fetch_cube method that only loads the cube in case that file has not yet been loaded. What is your indication that the file is loaded at each iteration?

By running the code in debug mode (using gdb), I found out that probably the first reason is the most time consuming, especially when the diffuse mapcube is stored in a big file. In fact, I reduced the file dimension using ctmapcube to generate a diffuse model file restricted to my ROI. This improves the computation time necessary for this method:
  • with a full sky model map: 5 minutes per iteration
  • with reduced map (using ctmapcube): 2-3 minutes per iteration (this is the number I put in the description above)

The response computation for a stacked analysis and a diffuse model is done in GCTAResponseCube::irf_diffuse. The method uses a cache (class GCTACubeSourceDiffuse) so in principle the diffuse response should only be evaluated once. The convolution is done in GCTACubeSourceDiffuse::set which loops over all pixels of the data space. But maybe the speed of the model eval method depends on the size of the map (would need some testing).

I don’t know if this is actually related to some memory issues on the machine I’m working on, which could be for some reason slow in loading these big files. I’ll test the same thing on the cluster as soon as I can.
Anyway, I think that there should be a way to keep in memory the loaded and convolved maps in order to avoid reading the files and remake the convolution at each iteration.

That’s in principle done.

Furthermore, I don’t know if it might be useful to store this convolved map in a separate file, similarly to what gtsrcmaps does in Fermi tools. This might be useful if one has to run multiple fits (or for example in tools like cttsmap or cssens) on the same region and/or may want to generate a model map. Nevertheless, I guess that there will not be many diffuse sources in the model for a single ROI, so probably the time saved would be just a few minutes per run.

This could be achieved by adding load() and save() methods to GCTACubeSourceDiffuse.

Finally, a similar issue is valid for sources that are kept fixed in the model, since the computation time of each iteration does not change when I fix the diffuse source. However, for fixed sources (not only diffuse sources) the model map could be calculated only once and stored before starting the fit.

Agree. For the moment the fixed sources can be simply combined in a diffuse cube using ctmapcube. That’s how I in fact use the tool: splitting off the fitted sources into an XML file and the use the remaining fixed sources to create a diffuse map.

I don’t know how much work these improvements would require and if they are worth the effort indeed.

In principle these improvements should be there already.

#4 Updated by Di Venere Leonardo over 5 years ago

Thanks for the reply.

Knödlseder Jürgen wrote:

Normally the file should be loaded only once. The GModelSpatialDiffuseCube::load_cube method is called by the GModelSpatialDiffuseCube::fetch_cube method that only loads the cube in case that file has not yet been loaded. What is your indication that the file is loaded at each iteration?

I ran ctlike using gdb and found out that the GModelSpatialDiffuseCube::load_cube() method is called more than once (should be once per iteration).However, I am not completely sure that this is actually the method that slows down the code.

Knödlseder Jürgen wrote:

Agree. For the moment the fixed sources can be simply combined in a diffuse cube using ctmapcube. That’s how I in fact use the tool: splitting off the fitted sources into an XML file and the use the remaining fixed sources to create a diffuse map.

This is actually a good way to solve the problem. However, the issue is that the computation is slow when I use a mapcube, even if the mapcube is not fitted.

I don’t know where the issue comes from, but I find strange that ctlike takes much more time for each iteration when a mapcube is included in the model. I would expect that only the computation of the first iteration could/should be slower with respect to the case in which the mapcube is not used.

#5 Updated by Knödlseder Jürgen over 5 years ago

Can you somewhere post the data and parameters of the run so that I can reproduce the issue? The caching should work as described, but maybe something is going wrong at some point.

#6 Updated by Di Venere Leonardo over 5 years ago

I’m attaching the log files of ctobssim and ctbin, together with the input xml model LSI.xml, summarizing the parameters I used to generate the data. I also attach the xml files (fit1.xml and fit2.xml) used to test ctlike.
As mapcube, I used the cube_iem.fits.gz available in the models tarball on cta owncloud.
I also reduced the all sky iem to a smaller mapcube using ctmapcube (reducing the number of pixels). I attach also the log file of ctmapcube.

#7 Updated by Knödlseder Jürgen over 5 years ago

Thanks for the files. I reproduced your test case.

Here first a comment on the ctlike run. Fitting all 3 components (LSI, diffuse background, instrumental background) here is what the ctlike log file shows:

2017-05-24T12:29:58: +=================================+
2017-05-24T12:29:58: | Maximum likelihood optimisation |
2017-05-24T12:29:58: +=================================+
2017-05-24T12:30:26:  >Iteration   0: -logL=96840.565, Lambda=1.0e-03
2017-05-24T12:30:26:    Parameter "Normalization" drives optimization step (step=0.393656)
2017-05-24T12:30:26:  >Iteration   1: -logL=96839.313, Lambda=1.0e-03, delta=1.252, step=3.9e-01, max(|grad|)=112.223180 [Index:3]
2017-05-24T12:30:26:    Parameter "Normalization" does not drive optimization step anymore.
2017-05-24T12:30:26:    Parameter "Normalization" hits minimum: -1.23371 < 0 (1)
2017-05-24T12:30:27:  >Iteration   2: -logL=96838.865, Lambda=1.0e-04, delta=0.448, step=1.0e+00, max(|grad|)=-0.655386 [Index:3]
2017-05-24T12:30:27:    Parameter "Normalization" hits minimum: -1.23382 < 0 (2)
2017-05-24T12:30:27:  >Iteration   3: -logL=96838.865, Lambda=1.0e-05, delta=0.000, step=1.0e+00, max(|grad|)=-0.548665 [Index:3]
2017-05-24T12:30:27: 
2017-05-24T12:30:27: +====================================+
2017-05-24T12:30:27: | Maximum likelihood re-optimisation |
2017-05-24T12:30:27: +====================================+
2017-05-24T12:30:27:  >Iteration   0: -logL=454767.807, Lambda=1.0e-03
2017-05-24T12:30:27:  >Iteration   1: -logL=446006.707, Lambda=1.0e-03, delta=8761.100, step=1.0e+00, max(|grad|)=-391.342711 [Normalization:1]
2017-05-24T12:30:27:  >Iteration   2: -logL=430477.759, Lambda=1.0e-04, delta=15528.948, step=1.0e+00, max(|grad|)=-216.761074 [Normalization:1]
2017-05-24T12:30:27:  >Iteration   3: -logL=409335.795, Lambda=1.0e-05, delta=21141.964, step=1.0e+00, max(|grad|)=-111.189053 [Normalization:1]
2017-05-24T12:30:27:  >Iteration   4: -logL=387682.835, Lambda=1.0e-06, delta=21652.960, step=1.0e+00, max(|grad|)=-52.775260 [Normalization:1]
2017-05-24T12:30:27:  >Iteration   5: -logL=370859.324, Lambda=1.0e-07, delta=16823.511, step=1.0e+00, max(|grad|)=-22.704059 [Normalization:1]
2017-05-24T12:30:27:    Parameter "Normalization" drives optimization step (step=0.034221)
2017-05-24T12:30:27:  >Iteration   6: -logL=370375.183, Lambda=1.0e-08, delta=484.141, step=3.4e-02, max(|grad|)=-21.941653 [Normalization:1]
2017-05-24T12:30:27:    Parameter "Normalization" does not drive optimization step anymore.
2017-05-24T12:30:27:    Parameter "Normalization" hits maximum: 1636.15 > 1000 (1)
2017-05-24T12:30:27:  >Iteration   7: -logL=370375.183, Lambda=1.0e-09, delta=0.000, step=1.0e+00, max(|grad|)=-21.941653 [Normalization:1]

From the times you see that it takes about 30 second before Iteration 0 starts, which is the time for the internal convolution with the IRF. Subsequently, the times between iterations are one second or so, which shows that the caching actually works. But I will add a debug print to the code to see when actually the load method is called.

But the fit does not go so well, I suspect this is because the diffuse emission model is in fact very weak in your ROI. The simulations give the following number of events:
  • LSI: 44357
  • Diffuse background: 20
  • Instrumental background: 3646

Note, however, that I did an unbinned analysis. Did you do a binned analysis using an exposure cube and so on?

#8 Updated by Di Venere Leonardo over 5 years ago

Thanks for looking into that.
I did a binned analysis without any exposure cube. I’ve just used the keys “caldb” and “irf” to specify the response, since I had one observation only.
I also tried simulating a much higher flux for the diffuse mapcube, in order to avoid convergence problems, and had the same issue.

Regarding the unbinned case, I’ve just tested the unbinned analysis and also in my case there isn’t any issue. The first iteration takes some minutes, while the following ones just a few seconds.

#9 Updated by Knödlseder Jürgen over 5 years ago

Here now the same for a binned analysis. Again, the map cube computation takes some time before Iteration 0, but then the computation is pretty fast:

2017-05-24T13:48:33: +=================================+
2017-05-24T13:48:33: | Maximum likelihood optimisation |
2017-05-24T13:48:33: +=================================+
2017-05-24T13:49:26:  >Iteration   0: -logL=-123342.189, Lambda=1.0e-03
2017-05-24T13:49:26:  >Iteration   1: -logL=-123475.782, Lambda=1.0e-03, delta=133.593, step=1.0e+00, max(|grad|)=-665.038484 [Prefactor:9]
2017-05-24T13:49:26:  >Iteration   2: -logL=-123504.931, Lambda=1.0e-04, delta=29.149, step=1.0e+00, max(|grad|)=89.761593 [Index:10]
2017-05-24T13:49:27:  >Iteration   3: -logL=-123505.552, Lambda=1.0e-05, delta=0.622, step=1.0e+00, max(|grad|)=1.952417 [Index:10]
2017-05-24T13:49:27:  >Iteration   4: -logL=-123505.552, Lambda=1.0e-06, delta=0.000, step=1.0e+00, max(|grad|)=-0.000678 [Prefactor:9]
2017-05-24T13:49:27: 
2017-05-24T13:49:27: +====================================+
2017-05-24T13:49:27: | Maximum likelihood re-optimisation |
2017-05-24T13:49:27: +====================================+
2017-05-24T13:49:28:  >Iteration   0: -logL=231101.910, Lambda=1.0e-03
2017-05-24T13:49:28:  >Iteration   1: -logL=202959.867, Lambda=1.0e-03, delta=28142.043, step=1.0e+00, max(|grad|)=-28523.370517 [Prefactor:3]
2017-05-24T13:49:28:  >Iteration   2: -logL=177665.026, Lambda=1.0e-04, delta=25294.841, step=1.0e+00, max(|grad|)=-13498.471294 [Prefactor:3]
2017-05-24T13:49:28:    Parameter "Normalization" drives optimization step (step=0.469999)
2017-05-24T13:49:28:  >Iteration   3: -logL=165349.628, Lambda=1.0e-05, delta=12315.398, step=4.7e-01, max(|grad|)=-8955.610887 [Prefactor:3]
2017-05-24T13:49:28:    Parameter "Normalization" does not drive optimization step anymore.
2017-05-24T13:49:28:    Parameter "Normalization" hits minimum: -74.6759 < 0 (1)
2017-05-24T13:49:29:  >Iteration   4: -logL=142682.467, Lambda=1.0e-06, delta=22667.161, step=1.0e+00, max(|grad|)=-3780.976038 [Prefactor:3]
2017-05-24T13:49:29:    Parameter "Normalization" hits minimum: -521.215 < 0 (2)
2017-05-24T13:49:29:  >Iteration   5: -logL=121212.168, Lambda=1.0e-07, delta=21470.299, step=1.0e+00, max(|grad|)=-6971.690659 [Index:4]
2017-05-24T13:49:29:    Parameter "Normalization" hits minimum: -4150.62 < 0 (3)
2017-05-24T13:49:29:   Iteration   6: -logL=121212.168, Lambda=1.0e-08, delta=-4633.913, step=1.0e+00, max(|grad|)=-52936.297535 [Index:4] (stalled)
2017-05-24T13:49:29:    Parameter "Normalization" hits minimum 0 more than 3 times. Fix parameter at minimum for now.
2017-05-24T13:49:29:   Iteration   7: -logL=121212.168, Lambda=1.0e-07, delta=-4633.832, step=1.0e+00, max(|grad|)=-52936.135668 [Index:4] (stalled)
2017-05-24T13:49:30:   Iteration   8: -logL=121212.168, Lambda=1.0e-06, delta=-4633.026, step=1.0e+00, max(|grad|)=-52934.517056 [Index:4] (stalled)
2017-05-24T13:49:30:   Iteration   9: -logL=121212.168, Lambda=1.0e-05, delta=-4624.973, step=1.0e+00, max(|grad|)=-52918.336589 [Index:4] (stalled)
2017-05-24T13:49:30:   Iteration  10: -logL=121212.168, Lambda=1.0e-04, delta=-4544.786, step=1.0e+00, max(|grad|)=-52757.094869 [Index:4] (stalled)
2017-05-24T13:49:30:   Iteration  11: -logL=121212.168, Lambda=1.0e-03, delta=-3776.177, step=1.0e+00, max(|grad|)=-51199.103946 [Index:4] (stalled)
2017-05-24T13:49:31:  >Iteration  12: -logL=119722.434, Lambda=1.0e-02, delta=1489.734, step=1.0e+00, max(|grad|)=-39666.081675 [Index:4]
2017-05-24T13:49:31:  >Iteration  13: -logL=113535.931, Lambda=1.0e-03, delta=6186.502, step=1.0e+00, max(|grad|)=9271.839400 [Index:4]
2017-05-24T13:49:31:  >Iteration  14: -logL=111552.026, Lambda=1.0e-04, delta=1983.905, step=1.0e+00, max(|grad|)=2576.681674 [Index:4]
2017-05-24T13:49:31:  >Iteration  15: -logL=111432.103, Lambda=1.0e-05, delta=119.923, step=1.0e+00, max(|grad|)=206.188079 [Index:4]
2017-05-24T13:49:32:  >Iteration  16: -logL=111431.572, Lambda=1.0e-06, delta=0.531, step=1.0e+00, max(|grad|)=5.414784 [Index:4]
2017-05-24T13:49:32:  >Iteration  17: -logL=111431.572, Lambda=1.0e-07, delta=0.000, step=1.0e+00, max(|grad|)=0.179065 [Index:4]
2017-05-24T13:49:32:    Free parameter "Normalization" after convergence was reached with frozen parameter.
2017-05-24T13:49:32:  >Iteration  18: -logL=111431.572, Lambda=1.0e-03, delta=0.000, step=1.0e+00, max(|grad|)=2.387240 [Normalization:1]
2017-05-24T13:49:32:    Free parameter "Normalization" after convergence was reached with frozen parameter.
2017-05-24T13:49:32: 
2017-05-24T13:49:32: +============================================+
2017-05-24T13:49:32: | Maximum likelihood re-optimisation results |
2017-05-24T13:49:32: +============================================+
2017-05-24T13:49:32: === GOptimizerLM ===
2017-05-24T13:49:32:  Optimized function value ..: 111431.572
2017-05-24T13:49:32:  Absolute precision ........: 0.005
2017-05-24T13:49:32:  Acceptable value decrease .: 2
2017-05-24T13:49:32:  Optimization status .......: converged
2017-05-24T13:49:32:  Number of parameters ......: 7
2017-05-24T13:49:32:  Number of free parameters .: 3
2017-05-24T13:49:32:  Number of iterations ......: 18
2017-05-24T13:49:32:  Lambda ....................: 0.0001

#10 Updated by Knödlseder Jürgen over 5 years ago

Di Venere Leonardo wrote:

Thanks for looking into that.
I did a binned analysis without any exposure cube. I’ve just used the keys “caldb” and “irf” to specify the response, since I had one observation only.

Okay, I will also try that.

Just one remark: ctmapcube is in fact not needed here since the IEM model is already a map cube. In that case the ctmapcube will just map the input map cube into another binning.

Maybe you can post the ctlike log file so that I can see what exactly is happening.

#11 Updated by Knödlseder Jürgen over 5 years ago

I just recognised that in the specific case you are using there is no caching of the diffuse model ;)

The diffuse model is cached for:
  • event lists
  • counts cubes if a stacked analysis is used

But for counts cubes without stacked analysis (which is still supported but not really recommended for the analysis) no caching has been implemented. This probably explains your issue.

Here the first iterations I get:

2017-05-24T13:55:57: +=================================+
2017-05-24T13:55:57: | Maximum likelihood optimisation |
2017-05-24T13:55:57: +=================================+
2017-05-24T13:57:24:  >Iteration   0: -logL=-123495.576, Lambda=1.0e-03
2017-05-24T13:58:54:  >Iteration   1: -logL=-123507.444, Lambda=1.0e-03, delta=11.868, step=1.0e+00, max(|grad|)=-15.779149 [Prefactor:2]
2017-05-24T14:00:23:  >Iteration   2: -logL=-123507.450, Lambda=1.0e-04, delta=0.006, step=1.0e+00, max(|grad|)=-0.009824 [Prefactor:2]

#12 Updated by Di Venere Leonardo over 5 years ago

Knödlseder Jürgen wrote:

Just one remark: ctmapcube is in fact not needed here since the IEM model is already a map cube. In that case the ctmapcube will just map the input map cube into another binning.

Sure. I used ctmapcube to reduce the number of pixels of the mapcube and speed up the computation.

Knödlseder Jürgen wrote:

I just recognised that in the specific case you are using there is no caching of the diffuse model ;)

The diffuse model is cached for:
  • event lists
  • counts cubes if a stacked analysis is used

But for counts cubes without stacked analysis (which is still supported but not really recommended for the analysis) no caching has been implemented. This probably explains your issue.

Ok thanks! I will use the stacked analysis, then!
Is there a way to force the caching also in the non-stacked case?

From what I understand, for a binned analysis with a single observation (one event file and IRF) or with multiple observations having the same response functions (multiple event files, but common IRFs), the stacked analysis should not be necessary, in the sense that the non-stacked analysis with common “caldb” and “irf” parameters should work the same way. Am I right?
This is why I was not using the stacked analysis for my simulation.

Anyway, in the real cases we will have multiple observations with different IRFs, so the stacked analysis will be necessary.

Also available in: Atom PDF