Action #811: Add sin() and cos() cache to GSkyDir - GammaLib - CTA IRAP Project Gateway

Action #811

Feature #810: Optimize computations

Add sin() and cos() cache to GSkyDir

Added by Knödlseder Jürgen almost 12 years ago. Updated about 11 years ago.

Status:

Closed

Start date:

03/28/2013

Priority:

Normal

Due date:

Assigned To:

Knödlseder Jürgen

% Done:

100%

Category:

Estimated time:

0.00 hour

Target version:

1.0.0

Duration:

Description

The dist() method is one of the most used function of the GSkyDir class. When called repeatedly, the dist() method recomputes for the moment the sin() and cos() of the coordinates. This can be prevented by adding a sin() and cos() cache that stores the values once computed.

The drawback is that the cache will considerably increase the size of the GSkyDir class, as there are 4 coordinates (RA, DEC, GLON, GLAT) and (at least) two functions (sin, cos), requiring 8 double precision values. In addition, we may want to store the tan() results that are used in posang(). And we need flags that signal that a cache has been set (this could eventually be done by Bit checking, yet Bit checking is slightly slower than bool checking ... at least on my Mac OS X).

GSkyDir.hpp (2.32 KB) Knödlseder Jürgen, 03/28/2013 04:15 PM

GSkyDir.cpp (7.26 KB) Knödlseder Jürgen, 03/28/2013 04:15 PM

Recurrence

No recurrence.

History

#1 Updated by Knödlseder Jürgen almost 12 years ago

To avoid a large amount of memory for the pre computation cache, we can try a logic where the pre computation cache is allocated dynamically. We may simply define a structure

struct cache {
    double m_sin_lat;
    double m_cos_lat;
};

that is dynamically allocated when necessary. We then just have to add two pointers

mutable cache* m_cel_cache;
mutable cache* m_gal_cache;

for the cache for celestial and galactic coordinates. By default, the pointers will be set to NULL, which is used as a flag for having the cache set or not. We then add methods

void set_cel_cache(void) const
{
    if (m_cel_cache == NULL) {
        if (!m_has_radec && m_has_lb) { // Make sure that RA, DEC are valid
            gal2equ();
        }
        m_cel_cache = new cache;
        m_cel_cache->m_sin_lat = std::cos(m_dec);
        m_cel_cache->m_cos_lat = std::cos(m_dec);
    }
    return;
}

to set the cache values. In addition, we can add methods

double sin_dec(void) const
{
    set_cel_cache();
    return m_cel_cache->m_sin_lat;
}

etc. for returning the cache values. When values are set, we simply deallocate the cache to make sure it will be recomputed the next time when it is needed

if (m_cel_cache != NULL) delete m_cel_cache;

This will bring a speed penalty due to memory allocation and deallocation, yet I guess that this is easily recovered after using the cache a few times. To be tested.

#2 Updated by Knödlseder Jürgen almost 12 years ago

File GSkyDir.hpp added
File GSkyDir.cpp added

I did some timing tests, the relevant code is GSkyDir.hpp and GSkyDir.cpp. For class variants have been implemented:

GSkyDir implements the actual dist computation
GSkyDir2 implements a precomputation cache and the methods sin_dec and cos_dec
GSkyDir3 implements a precomputation cache and the cache computations are directly performed in the relevant class
GSkyDir4 implements a dynamic allocation of a precomputation cache, and the cache computations are directly performed in the relevant class

The main difference between GSkyDir2 and GSkyDir3 is that no overhead in calling sin_dec and cos_dec is needed in the latter, while the former needs function calls, although the sin_dec and cos_dec methods have been declared inline.

Computations have been done using the nested loop


    // Loop
    double sum = 0.0;
    for (int outer = 0; outer < nouter; ++outer) {
        GSkyDir crab( 83.6331,  22.0145);
        GSkyDir vela(128.8361, -45.1764);
        for (int inner = 0; inner < ninner; ++inner) {
            double dist = crab.dist(vela);
            sum += dist;
        }
    }

where nouter and ninner are the number of iterations in the outer and inner loop, respectively. This allows to study the speed gain when reusing the cache values.

Here a CPU benchmark on Mac OS X for 10000000 / ninner iterations, where ninner was selected to be 1, 2, 3, 10, 100 and 1000:

Class	1	2	3	10	100	1000
GSkyDir	1.41949	1.36537	1.32523	1.21473	1.22217	1.19108
GSkyDir2	1.6772	1.21985	1.06055	0.846352	0.771483	0.761635
GSkyDir3	1.47498	1.07373	0.917423	0.632076	0.552024	0.542549
GSkyDir4	3.15931	1.9054	1.45324	0.838957	0.577843	0.604065

GSkyDir is obviously fastest when the dist computation is never reused (ninner=1). GSkyDir3 comes closets to the results, since only little overhead is involved with testing the existence of precomputed values. GSkyDir4 is worst, as the dynamic allocation of the cache memory takes some time. The dynamic allocation penalty is only recovered after a few reuses of the values.

If the precomputed values are reused (ninner>1), GSkyDir3 works best, taking only 45% of the time needed for the original method for (ninner>100). The drawback of this method is that the memory held by the class is essentially doubled with respect to the original class. If precomputed values are dynamically allocated, about 50% of the time is needed for (ninner>100).

For comparison, here the same table on galileo (32 Bit Linux system):

Class	1	2	3	10	100	1000
GSkyDir	7.01	6.87	6.83	6.57	6.58	6.61
GSkyDir2	7.34	5.24	4.57	3.63	3.27	3.24
GSkyDir3	7.1	4.99	4.29	3.3	2.91	2.88
GSkyDir4	8.85	5.9	4.88	3.48	2.94	2.89

The general behavior is similar, yet the dynamic memory allocation penalty is considerably smaller. With a single reuse of the cache, the GSkyDir4::dist method becomes faster than the GSkyDir::dist method. For (ninner>100), the speed penalty for dynamic memory allocation completely vanished.

And finally the same table for kepler (64 Bit Linux system):

Class	1	2	3	10	100	1000
GSkyDir	4.35	4.31	4.16	4.01	3.96	3.95
GSkyDir2	4.9	3.36	2.93	2.14	1.85	1.82
GSkyDir3	4.52	3.02	2.61	1.85	1.57	1.52
GSkyDir4	6.05	3.86	3.19	2.04	1.61	1.57

Here the dynamic memory allocation penalty is larger, but the code is still faster than the original code for (ninner>1). The computation time for large ninner is 40% of the original time.

Conclusion: The implementation of a cache should be considered. Using additional members roughly doubles the size of GSkyDir, but reduces the computing time to 40-45% of the actual duration on all tested systems. Using a dynamically allocated cache will slow down significantly the computations if the values are never reused, hence this approach should not be pursued.