Difference between revisions of "Talk:Coding Standards"

From Sirikata Wiki
Jump to navigation Jump to search
Line 351: Line 351:
 
   #include meters.h
 
   #include meters.h
 
   #include seconds.h
 
   #include seconds.h
 +
  #include m_per_s.h
 
   #include number.h
 
   #include number.h
 
   ............
 
   ............

Revision as of 18:20, 18 August 2009

There is no mention of const-correctness in the standards page. (Sorry, there IS; I just missed it on first read.) Where it talks about the use of get/set prefixes, the example given is incomplete. It says to write,

int getAge();
void setAge( int Age );

It should say to write

int getAge() const;
void setAge( int anAge );

(Besides, someone might expect Age to be a float, or an unsigned, or a size_t, or a special class with years and months and days... There's probably something to be said for choosing a name that better reflects the underlying representation; --NOT talking about "iAge"; rather a name like "YearsOfAge", and its type to be a size_t, unless it can be negative.)

Const-correctness essentially boils down to making const anything that CAN be const. Sometimes the constness of functions is made difficult by intrinsic linkage, such as when objects have a pointer to next, and are part of an LRU cache that should update on reads. Well, such situations call for making that pointer to next mutable; --after all, it is not really a part of the class, philosophically speaking--, rather than compromise constness on the class interface.

Another subject not mentioned is command/query separation. This is an enormoulsy helpful discipline that makes for clean and readable code when adhered to religiously. Command/Query separation says that there are two types of functions that should never be mixed or combined in one: Queries return a value, ALWAYS; and as class member functions are always ...() const. Commands are NEVER const, and ALWAYS return void. And what people new to this concept ALWAYS ask is "what about returning a success or failure status?". Answer is NEVER do that. Failures should be "returned" by throw()- -ing an exception. Status returns have been depracated for years now.

Another subject not addressed is, in fact, exceptions, and the huge importance of marking functions that do NOT throw excetions simply with a comment

void f() // no-throw

Why? Because the big gottcha with exceptions is that you DON'T want them to happen inside a constructor; so by flagging non-throwing functions //no-throw you can, at a glance, tell if you can call it from a ctor.

Another subject not addressed is the use of "explicit" for single arg ctor's.

Another subject not addressed is avoiding like the plague the terrible practice of using special values with floats to convey additional state. This bad practice is a time-bomb, and a documentation nightmare. Someone modifying a piece of code using one such variable cannot be expected to look up the declaration of every variable to scan visually for comments about them; so a variable that can have special values meaning special things should be accompanied by comments about these special values and their meanings everywhere they are used!!! Just pack a bool with the float into a struct, like

struct fuel_capacity
{
   float litres;
   bool this_is_a_station__dogbreath__aint_got_no_fuel_tank;
};

rather than signify no fuel use by a -1. value. Or if you are really keen on the efficiency aspect, at least wrap that float into a class, like

class fuel_capacity
{
  float litres;
public:
  bool is_applicable() const { return litres >= 0.0f );
  float get_capacity() const
  {
    assert( is_applicable() );
    return litres;
  }
  ...
};

Another subject not addressed is variable and function names, and how to make them intuitive and clear. For example, boolean variables should always read like statements that can only be true or false. E.g,

bool light_is_on; bool tank_has_enough_fuel;

etc. A function that returns the state of a light could be similarly named, but with a verb at the front,

bool light::is_on() const;
bool tank::has_enough_fuel_for( trip const & trp ) const;

A function that turns a light on should be named like

void light::turn_on();

Or if a boolean variable controls the light, it could be named,

bool light_enable_switch;

There should NEVER be ambiguously named variables or functions like

light_on //BAD!!! Is this a switch that turns it on?, a status?, or what?

Least of all, something like

int light_on( double x, int y ); // HORROR

Another very good practice not mentioned is avoiding inheritance of concrete classes. All hierarchies trees should be abstract, having a dtor like,

virtual ~aClass() = 0;

with only the leafs being concrete. Adhering to this discipline avoids many dangers associated with automatically generated assignment operators and constructors that can lead to slicing or simply non-sensical assignment across inheritance levels. It also allows you to make concrete class destructors non-virtual, if you know that a concrete class will never be inherited. There'd be many exceptions to this rule, however; the idea is merely to think twice... make that three times.. before having one concrete class inherit another concrete class. Avoid whenever possible.

Also, avoiding virtual functions in the public interface of classes. The preferred method is to have public non-virtual, inline functions that call private virtual functions

class unit
{
public:
  void fly(){ _fly(); }
private:
  virtual void _fly();
};

Exceptionally, a virtual function could be protected:

class unit
{
public:
  void fly(){ _fly(); }
protected:
  virtual void _fly();
};

The semantic difference here is that if you make _fly() private, you are communicating to other programmers that derived class overrides of _fly() need not call nor know anything about the base class' _fly() function. In the latter case, you are communicating to programmers that derived class overrides of _fly() SHOULD call the base-class _fly() function; that is, that they should add to, or "decorate" the original function, rather than replace it. But if you make fly() a public virtual function, you're communicating to other programmers that you just don't know the language and its idioms very well ;-)

Separating "data" from "code" or "behavioral" classes is also a good idea. Classes representing passive data should feature a complete set of initializing constructors and assignment operators, as well as binary operators for comparisons and/or arithmetic if applicable, and explicit initializations to/from base classes; but no other functionality. Behavioral classes should often have a single data member being a pointer to a struct containing the "real" data members; --that is, they should often use the PIMPL idiom, and often should have their default assignment operator and constructor disabled by being declared private and left undefined. Functions in code classes access variables in the struct via the single pointer member. This practice avoids excessive header information exposure. Clients of a PIMPL only need information about its public functions, --not about its data members; and having data members in its main header implies that their type definitions must also be included, which calls for additional header files, and so on, which can often lead to vicious header inclusion circles.

Global space and non-class-function calls should be prefixed :: to make it clear that they are not calls to member functions. This would include functions from static and dynamic libraries. Thus, rather than a = floor( b ), write a = ::floor( b ).

Another very helpful standard to adopt is DBC (Design By Contract). DBC cannot be done fully in C++. Long story. But at least a subset can be achieved. Every class should have a private, inline function void _chk_invariants() that is defined as empty in release mode but has code in debug mode. The debug mode code checks the state of the class for coherence.

class myString
{
  char str[32];
  size_t len;
public:
  ...
private:
  ...
  void _chk_invariants()
  {
    assert( len < 32 );
    assert( len >= 0 );
    assert( ::strlen(str) == len );
  }
};

Then, all class functions can have calls to _chk_invariants(): at the starts of queries and at the ends of commands. Thus, queries make sure that nothing's wrong with the class' state before answering the query; and commands make sure that they leave the class in a coherent state after they execute. Additionally, every function should begin with assertions about its input parameters, to make sure the input they get is within specified range; that pointers are not null, etc. This falls far short of DBC as implemented in the Eiffel language; but it's a start. Note that input check asserts should NOT even look at private class or function variables; only input variables. Why? Because it would be an unfair contract to expect the caller to ensure anything about the state of the class, which it cannot see. Just for kicks, adding assertions before the ends of queries, ensuring that output is within specified range; that pointers returned are not null, etceteras, would be nice. In the case of commands, the call to _chk_invariants() is sufficient. When all classes are built from the ground up with coherence checks and their functions are written with input checks from the beginning, debugging becomes much easier, as the consequences of bugs tend to be caught much sooner and closer to the source of the problem.

Avoid confusing code errors and debugging with run-time exceptions. These are two totally different paradigms and should not be mixed. Checking for possible consequences of code error is what debugging is all about; and the standard way to deal with code errors is by assert()'s. The difference between code errors and exceptions is that code errors are bad things that "should NOT happen", whereas run-time exceptions are bad things that one foresees WILL happen, such as the user having removed a CD that the app needed. The standard way to deal with run-time exceptions is to throw( new disk_not_found() ); then catch the exception at some convenient place up the stack, and deal with it. Avoid using exceptions for debugging; or asserts for run-time exceptions. There's admittedly a gray area, however: Code is never perfect; and sometimes it may be desirable to deal with a code error as an exception rather than to terminate an application; but this should be rarely done; only for types of bugs that are known but seemingly intractable; and should be documented as such. Also, error-logging tends to straddle both worlds; but, by the same token, it may be preferable to have separate files for debugging and exception logs. Just having __ASSERT__ and __THROW__ macros that direct messages to two separate files may be a way to implement this separation. Of course, exception handling could include logging of messages; but a __THROW__ macro could add extra info, such as __FILE__ and line number for debug mode exception logs.

High Performance Coding: It's been said enough, perhaps too much, that code optimization is the last step. I'll just say a couple of things here on the other side of the coin: Sometimes it pays to consider some aspects of optimization up-front. Converting non-multi-threaded code to multi-threaded code is HARD, for example. It may be much easier to design a multi-threaded application multi-threaded from the start. Also, knowing from the beginning if a function is called 12 times per second or 12,000 times per second can give the coder a sense of perspective. AMD has a free performance analysis tool, for Linux and Windows, Code Analyst: http://developer.amd.com/cpu/CodeAnalyst/codeanalystlinux/Pages/default.aspx? It's easy to use, even without reading the documentation, just running it with default settings can give you useful information, such as numbers of times functions are called, which can then be written to a comment in the function header comments block, and periodically updated. One additional benefit of knowing which functions are called the most is that the code can then be refactored such that most of the fast code resides in one file or in a small number number of files. These files can then be flagged to the compiler for "optimize for performance". The rest of the code should be "optimized for size", in order to get the highest performance. This may sound bogus but isn't. Back in the 90's, memory was so much faster than processors that a friend of mine had built a dual processor computer with shared memory by time-slicing memory access. But nowadays the situation is completely reversed: Processors can execute hundreds of instructions... rather thousands, in the time it takes to read a variable from memory. What's saving us from performance hell is cache: L1 cache, L2 cache, L3 chache... The worst performance bottleneck is when data is NOT in any cache, and has to be fetched from memory. That's called a "cache miss". The problem with cache is that it's expensive, and therefore limited in size. The smaller the code, the greater the chances that a particular piece of it will be found in cache. Therefore, for 90%, or 95% of an executable, the best performance optimization is, in fact, size optimization; and "optimize for performance" is best only applied to code that is really in the innermost loops. Note, however, that there is no "data size optimization" settings for any compilers. Compilers won't even touch your data structures, and it is entirely the responsibility of the programmer to optimize data structures for minimum size, --alignment considerations notwithstanding.

Another refrain in code optimization always reminds us of something about QuickSort versus BubbleSort; I forget what it was, now. Indeed, BubbleSort can be 100's of times faster than QuickSort. Don't believe me? Put it this way: Quick Sort boasts O(N*logN), versus O(N^2) for bubble; BUT, the dice are loaded in those tests. The data set is typicall totally randomized. That puts QuickSort in the best light, and BubbleSort in the worst. If our data were already sorted, we'd be catching QuickSort at its worst (still taking O(N*logN) time to do nothing at all), while catching BubbleSort at its best (one quick pass, or O(N), and it exits). Even if the data is only roughly pre-sorted, but we can guarantee sorting in a small number of BubbleSort passes, the performance is still officially O(N), and BubbleSort wins. And not only that, but a pass of BubbleSort also wins the cache coherence benchmark: A linear progression is easy to predict by the automatic prefetching mechanisms of modern CPU's, whereas QuickSort jumps all over the place, resulting in cache misses galore. In a 3D engine, a lot of the sorting needs are indeed incremental, working with pre-sorted data. Suppose you keep an index of objects in a scene sorted by distance to the camera. The first time you order the objects, QuickSort is the thing to use; but on subsequent 'frames', the index is already 99.9% sorted; any two objects in it would swap positions once every so many frames. Thus, a pass of BubbleSort --occasionally two-- are sufficient to restore the sorted status of the index, at a fraction of the processing cost. Conclusion: Use QuickSort where you expect the data to be highly random; but choose BubbleSort otherwise.

I can't think of a better way of titling this section than.. USE THE TYPE SYSTEM What I'm talking about is increasing code safety and cleanliness by using types for things like measures and identifiers. Say, for instance, you have a system of factions, which have a name (string) and ID numbers (ints). First problem with that is you might end up duplicating gazillions of functions, for the sake of accepting string or IDnum identifiers (void updateRelation( string faction1, string faction2 ); void updateRelationInt( int faction1, int faction2 ); etc.). Second problem is that if you use integers as identifiers for groups or crowds, and for cities, and for door-frames, you could conceivably write code that subtracts a group from a faction, adds a city, and assigns the result to a door-frame; and the compiler would compile and link that happily. (Some might think of this as "Power"...). Instead, use a class for Faction (excuse the size of the example and the simplified pseudo-code):

 class Faction
 {
   using namespace std;
   typedef unsigned short ID_t; //16 bits should suffice for faction ID's?
   string name;
   ID_t ID;
   bool initialized;
   ...
   //ID_generator should be a singleton, really; just trying to simplify the example
   class ID_generator : private MultiServerSynchronizedIDGen
   {
     static ...
   public:
     static ID_t getNewID( string aNewName );
   };
   static ID_generator IDgen;
   static hash_map< string, ID_t > StoID_table;
   static array< string > IDtoS_table;
   void chk_invariants() const
   (
     #ifndef NDEBUG
       assert( !initialized || ID && name );
     #endif
   }
 public:
   Faction() : name(), ID(0), initialized(false) {}
   //compiler-generated copy ctor and assignment ok
   bool is_initialized() const { return intialized };
   #ifndef NDEBUG
     ID_t getID() const //ID's not needed by clients; for debugging only
     {
       chk_invariants();
       assert( is_initialized );
       return ID;
     }
   #endif
   string const & getName() const
   {
     chk_invariants();
     assert( is_initialized );
     return name;
   }
   friend bool operator==( Faction const & fac1, Faction const & fac2 )
   {
     assert( fac1.is_initialized() );
     assert( fac2.is_initialized() );
     return fac1.ID == fac2.ID;
   }
   friend bool operator!=( Faction const & fac1, Faction const & fac2 )
   {
     return !fac1==fac2;
   }
   void createNewFaction( string const & aNewName ) //no-throw
   {
     //simplified pseudo-code:
     assert( aNewName.len() > 0 );
     chk_invariants();
     assert( ! initialized );
     assert( ! StoID_table.find( aNewName ) );
     try
     {
       ID = IDgen.getNewID( aNewName );
     }
     catch(...)
     {
       ...
     }
     name = aNewName;
     StoID_table.insert( name, ID );
     IDtoS_table[ID] = ( name );
     initialized = true;
     chk_invariants();
   }
 };

Now, you can use Faction objects in place of names or ID's everywhere a faction needs to be identified; and if you try to add a faction to a group and subtract that from a door-frame, the compiler will refuse. In fact; you can't even add a Faction to a Faction, because the + and += operators are not defined; --only those operators that are needed are defined (== and !=). Furthermore, all the complexities of string and number identifiers are hidden from clients. Faction relations could also be made part of the class, with a new, static hash-table that uses <ID_t,ID_t> pairs as keys. A similar philosophy can be applied to measures (short of using a real SI-units package): Instead of passing around lengths and distances as floats, it is MUCH safer AND clearer to have a

 template <class T> //representation --e.g. float, double...
 class meters
 {
   T num;
 public:
   meters() : num(0.0) {}
   explicit meters( T anum ) : num(anum) {}
   template <clas U>
   explicit meters( U anum ) : num(anum) {}
   //only needed operators:
   .......
 };
 template meters<float>;
 template meters<double>; //or however explicit instantiation was written; I forget
 ............
 ............
 ............
 #include meters.h
 #include seconds.h
 #include m_per_s.h
 #include number.h
 ............
 template< class T >
 number<T> ::operator/( meters<T> m1, meters<T> m2 )
 {
   assert( m2 != 0.0 );
   return m1/m2;
 }
 ............
 template< class T >
 m_per_s<T> ::operator/( meters<T> m, seconds<T> s )
 {
   assert( s != 0.0 );
   return m/s;
 }

Easy to code. Much easier to use than SI-unit packages; --though very limited. But enormously useful in terms of bug-catching and code clarity.