Change request #558: Gammalib CDC: Coding and Design Conventions - GammaLib - CTA IRAP Project Gateway

Change request #558

Updated by Deil Christoph over 11 years ago

h1. Feedback on the document

Here's my feedback on the "Gammalib CDC: Coding and Design Conventions" draft from 2012-09-21, located at @doc/dev/coding/gammalib_coding.tex@ in the gammalib repo.

# This is definitely a very useful document for a new gammalib developer like myself. Would it be possible to create html versions of this document and the other latex documents (like inst/gammalib_inst.tex and maths/gammalib_maths.tex) from the top-level "make doxygen"? This would be nice to always have easily accessible with the doxygen API class documentation while coding.
# I think a high-level explanation of how the registries (GModelRadialRegistry, GModelRegistry, GModelSpatialRegistry, GModelSpectralRegistry, GModelTemporalRegistry, GObservationRegistry, GWcsRegistry) work should be added to this document or the Gammalib Software Design Description.
# I think a rule "use C++ (std::string) instead of C-style (char*) strings" should be explicitly mentioned.
This seems to be followed followed everywhere already, except why does GLog need a C-string interface in addition to the C++-string interface?
# I'll try to make an Eclipse code format (one XML file) that matches the gammalib rules. Then developers don't have to think about code formatting, but can just press CMD+Shift+F before saving and checking in.
# Why require a return statement in functions / methods returning void? This seems completely superfluous (i.e. a waste of screen real estate) to me.
# The statements "Do not use C++ namespaces" and "Do not use C++ templates" should be qualified to gammalib, obviously std::vector is used a lot., which is a template in a namespace. I don't know if this is up for discussion, but I'd prefer if gammalib did use namespaces and templates. I'll try to make my case a section below.
# Is it really necessary to include the license in each file? I don't like that whenever I open a file to read, all I see at first is the license and I have to scroll down to start reading. Many other projects just have one line per file "GPL 3 license, see LICENSE.txt", but I'm not even sure that is necessary.
# If you look at the example hpp and cpp and python interface files for GClass, you'll see the description "My nice class" appear 8 times (In each file also in short form on top of the license). This is bad because it's easy to forget to edit some of the comments when copying and modifying old classes to make new ones. I'd prefer exactly one description, right in front of class GClass in the header, where doxygen will pick it up. Every C++ programmer knows to look at the corresponding header file for a given cpp or swig interface file. Also please mention that this class template is available at @src/template/GClass.hpp@ and @src/template/GClass.cpp@ and should be copied and edited when writing a new class instead of starting from scratch.
# The @GEbounds@ example violates the rule "Use std::vector instead of new / delete where possible", I think it should use std::vector.
# For gammalib classes there are these recommendations "each C++ class should have the following private or protected methods for memory management: init_members(), copy_members(), free_members() and if new/delete is used alloc_members()" and "Every class should have at least a void constructor, a copy constructor, a destructor and an assignment operator" and then example implementations of these eight methods are given. Would it be worth to use an abstract base class for all gammalib classes (maybe called GClass or something else)? The advantage would be that the compiler helps the developer by enforcing these recommendations and maybe even save typing by giving the default implementations for the void constructor, copy constructor, destructor and assignment operator (I haven't thought about derived classes if there also default implementations would work). One disadvantage would be that if I want to take a gammalib class and use it in another project, I also have to copy GClass. I realize this would be a big change to gammalib, and Jürgen, if you consider it I think definitely a few C++ gurus should comment first.
# At the end of 4.2.2 you show an example assignment operator for a derived class, then at the end of section 4.2.3 you show an example void constructor for a derived class. You should also show example copy constructor and assignment operator implementations for a derived class (and a multiple inheritance example if that exists) and put all four derived class examples in the same section (4.2.3).
# Can you mention if gammalib ever use multiple inheritance or if it is allowed?
# I generally like the balance between short and explicit class names in gammalib. There's a few cases where I think more explicit names would be better, because then I can guess from the name what they are and remember them better while coding: "Ptsrc" -> 'PointSource", "Plaw" -> "PowerLaw", "Ran" -> "Random", "Inst" -> "Instrument", "XmlPI" -> "XMLProcessingInstruction"
# I did not understand (as a mediocre C++ programmer) the last two paragraphs in Section 4.2.4. Maybe giving an example would help?
# There's a few things in Section 4.2.5 I don't understand:
## Why does roi() and events() take and return pointers, but the other methods use references? You say the rule is "Pointers should only be used if NULL is a possible value", but why is NULL a possible value for an roi, but not a gti?
## Can you give an example that demonstrates the rule "never return base class objects by reference, as this will lead to code slicing if the method is used for object assignment"?
## I think in the examples of how method return values the comment should say "const class member reference" and not "const base class member reference" (otherwise, doesn't this example violate the rule just mentioned)?
## In the last paragraph you say "The ... methods do not return class members, hence they return an object by value". Why "hence", if that is a rule I would state it more prominently. Don't you have e.g. factory methods that create something and return a pointer and the user is responsible for deleting it?
# For the container classes you have a "list of mandatory methods for container classes". Again, wouldn't it be better to have the compiler help the developer by enforcing this interface via n abstract base class? Naively I would think again (as for GClass, see above) that the advantages of using a GContainer base class would outweigh the disadvantages.
# Why do you use 00-06-02 for the version number format? Almost all projects / package managers I know use 0.6.2.
# "The definition of pure virtual methods should be done in a section that is separate from the methods that are implemented." -> where? At the beginning or end? Best include this in the example.
# Write the python code example like this:
<pre>
>>> GClass c
>>> print c
</pre>
# At the end of section 4.2.1 you write "Do always check ...", "Do never allocate ...", "Do always initialise ...", I would simply write "Always check ...", "Never allocate ...", "Always initialise ...".
# At the beginning you mention "use std::vector or std::array containers". Actually std::array is never used in gammalib, so I would either just say "use std::vector as a container" or "use std::vector as a container and pre-allocate it's size when constructing it where possible for speed". I've never used std::array, so only using std::vector would (at least for me) simpler.
# Functions (meaning standalone, not class member functions) are never mentioned. Do they exist? Are they discouraged for some reason?
# Is this the right document to describe testing? I.e. that gammalib should be developed using test-driven development, i.e. each class should have C++ and python unit tests, how the test runner works locally and automatically on Jenkins.

In addition to this feedback on the CDC document I'd like to mention two important points where I would have preferred gammalib to be different.
Note that there is also a document "Gammalib SDD: Software Design Description" draft from 23 May 2010, located at @doc/dev/sdd/gammalib_sdd.tex@ in the gammalib repo which is relevant to this discussion, but it doesn't seem to be maintained. It would be very good to have such a high-level document for new developers or users trying to learn the gammalib API.

h1. General remarks on gammalib design decisions

h2. Use namespaces in gammalib to be more structured / modular?

Currently gammalib is composed of these 14 modules that are at the end linked together into libgamma (alphabetical order): app, cta, fits, lat, linalg, model, mwl, numerics, obs, opt, sky, support, test, xml
These correspond to the folders src/app, inst/cta, src/fits, inst/lat, src/linalg, src/model, inst/mwl, src/numerics, src/obs, src/opt, src/sky, src/support, src/test, src/xml
However the header files (except for the inst = instrument classes) are all in one include folder, currently containing 127 header files.

As a new gammalib developer, I find it very difficult to get an overview of which of the 127 classes (will become more in the future) depend on each other. Having modules and a module diagram dependency graph ("depends on" meaning "has to link against") would help quite a bit to make the gammalib API more accessible, I think. Having such well defined module interfaces would probably also result in a simpler API where the different parts are less coupled (or maybe not, I can't tell what depends on what at the moment, I wouldn't be surprised if gammalib is already very well designed).

I also think it might be nice to use a global @Gamma@ namespace instead of prepending every file and class with @G@. I think namespaces are one of the great advantages of C++ over C, because they make it explicit in library client code where a given symbol comes from.

Is there a technical reason (e.g. swig python interface or linker problems) why namespaces are not used?

h2. Use templates in gammalib/fits to avoid code duplication?

For the FITS image and table classes we currently have 10x code duplication:
<pre>
GFitsImage.hpp GFitsImageLong.hpp GFitsImageULong.hpp GFitsTableBoolCol.hpp GFitsTableCol.hpp GFitsTableLongLongCol.hpp GFitsTableUShortCol.hpp
GFitsImageByte.hpp GFitsImageLongLong.hpp GFitsImageUShort.hpp GFitsTableByteCol.hpp GFitsTableDoubleCol.hpp GFitsTableShortCol.hpp
GFitsImageDouble.hpp GFitsImageSByte.hpp GFitsTable.hpp GFitsTableCDoubleCol.hpp GFitsTableFloatCol.hpp GFitsTableStringCol.hpp
GFitsImageFloat.hpp GFitsImageShort.hpp GFitsTableBitCol.hpp GFitsTableCFloatCol.hpp GFitsTableLongCol.hpp GFitsTableULongCol.hpp
</pre>

Wouldn't it make sense to use one C++ template class GFitsImage and GFitsTable instead?
Would that work?
What are the disadvantages?

There's a few other places where templates would help avoid code duplication.

Btw., how about add a rule: "Code duplication should be avoided by using composition, inheritance and templates."
(in case you hadn't noticed, I hate boilerplate and duplicated documentation / code.)

Back