Difference between revisions of "Talk:Coding Standards"

From Sirikata Wiki
Jump to navigation Jump to search
 
(32 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
= Patterns =
 +
 +
== Const correctness ==
 +
 
There is no mention of const-correctness in the standards page. (Sorry, there IS;
 
There is no mention of const-correctness in the standards page. (Sorry, there IS;
 
I just missed it on first read.)
 
I just missed it on first read.)
Line 4: Line 8:
 
given is incomplete.
 
given is incomplete.
 
It says to write,
 
It says to write,
 +
<source lang="cpp">
 
  int getAge();
 
  int getAge();
 
  void setAge( int Age );
 
  void setAge( int Age );
 +
</source>
 
It should say to write
 
It should say to write
  int getAge() '''const''';
+
<source lang="cpp">
 +
  int getAge() const; //<--
 
  void setAge( int anAge );
 
  void setAge( int anAge );
 +
</source>
 
(Besides, someone might expect Age to be a float, or an unsigned, or a size_t, or a
 
(Besides, someone might expect Age to be a float, or an unsigned, or a size_t, or a
 
special class with years and months and days... There's probably something to be said
 
special class with years and months and days... There's probably something to be said
Line 21: Line 29:
 
all, it is not really a part of the class, philosophically speaking--, rather than
 
all, it is not really a part of the class, philosophically speaking--, rather than
 
compromise constness on the class interface.
 
compromise constness on the class interface.
 +
 +
Furthermore, const correctness applies more to software design discipline rather than
 +
peformance as the main article leads one to believe. Although compilers are probably
 +
free to use const modifiers for optimization, the advent of full program optimization
 +
makes this less of an issue (compilers optimizing at link time will already know
 +
a function has no side effects most of the time), its principal rationale is to
 +
formalize a fundamental interface aspect of the specification: that we promise this
 +
function will not, not now and not ever (until we break the interface that is),
 +
generate observable side effects. Const correctness also allow the compiler to enforce
 +
this for us, and therefore catch programming errors earlier.
 +
 +
== Command/query separation ==
  
 
Another subject not mentioned is command/query separation. This is an enormoulsy
 
Another subject not mentioned is command/query separation. This is an enormoulsy
Line 34: Line 54:
 
Another subject not addressed is, in fact, exceptions, and the huge importance of
 
Another subject not addressed is, in fact, exceptions, and the huge importance of
 
marking functions that do NOT throw excetions simply with a comment
 
marking functions that do NOT throw excetions simply with a comment
 +
<source lang="cpp">
 
  void f() // no-throw
 
  void f() // no-throw
 +
</source>
 
Why?
 
Why?
 
Because the big gottcha with exceptions is that you DON'T want them to happen inside
 
Because the big gottcha with exceptions is that you DON'T want them to happen inside
 
a constructor; so by flagging non-throwing functions //no-throw you can, at a glance,
 
a constructor; so by flagging non-throwing functions //no-throw you can, at a glance,
 
tell if you can call it from a ctor.
 
tell if you can call it from a ctor.
 +
 +
== Implicit constructors ==
  
 
Another subject not addressed is the use of "explicit" for single arg ctor's.
 
Another subject not addressed is the use of "explicit" for single arg ctor's.
 +
 +
Single arg constructors are considered differently by C++, when the semantics of the
 +
special constructor is not the intended semantics, the safe way to define it is by
 +
flagging it with the ''explicit'' keyword:
 +
<source lang="cpp">
 +
class List
 +
{
 +
    // Wrong, it will try to use this ctor to cast a size_t into a list
 +
    List(size_t size);
 +
 +
    // Right, people will have to explicitly invoke this ctor
 +
    explicit List(size_t size);
 +
}
 +
</source>
 +
 +
== Formal special values ==
  
 
Another subject not addressed is avoiding '''like the plague''' the terrible practice of
 
Another subject not addressed is avoiding '''like the plague''' the terrible practice of
using special values with floats to convey additional state. This bad practice is a
+
using special values with floats to convey additional state (informal special values).  
 +
This bad practice is a
 
time-bomb, and a documentation nightmare. Someone modifying a piece of code using
 
time-bomb, and a documentation nightmare. Someone modifying a piece of code using
 
one such variable cannot be expected to look up the declaration of every variable to
 
one such variable cannot be expected to look up the declaration of every variable to
Line 50: Line 91:
 
and their meanings everywhere they are used!!! Just pack a bool with the float into
 
and their meanings everywhere they are used!!! Just pack a bool with the float into
 
a struct, like
 
a struct, like
  struct fuel_capacity
+
<source lang="cpp">
 +
  struct FuelCapacity
 
  {
 
  {
 
     float litres;
 
     float litres;
     bool this_is_a_station__dogbreath__aint_got_no_fuel_tank;
+
     bool notApplicable; // this is a station dogbreath aint got no fuel tank
 
  };
 
  };
 +
</source>
 
rather than signify no fuel use by a -1. value.
 
rather than signify no fuel use by a -1. value.
 
Or if you are really keen on the efficiency aspect, at least wrap that float into
 
Or if you are really keen on the efficiency aspect, at least wrap that float into
 
a class, like
 
a class, like
  class fuel_capacity
+
<source lang="cpp">
 +
  class FuelCapacity
 
  {
 
  {
 
   float litres;
 
   float litres;
 
  public:
 
  public:
   bool is_applicable() const { return litres >= 0.0f );
+
   bool isApplicable() const { return litres >= 0.0f );
   float get_capacity() const
+
   float getCapacity() const
 
   {
 
   {
     assert( is_applicable() );
+
     assert( isApplicable() );
 
     return litres;
 
     return litres;
 
   }
 
   }
 
   ...
 
   ...
 
  };
 
  };
 +
</source>
  
Another subject not addressed is variable and function names, and how to make them
+
== Avoiding Concrete Inheritance ==
intuitive and clear. For example, boolean variables should always read like statements
 
that can only be true or false. E.g,
 
bool light_is_on; bool tank_has_enough_fuel;
 
etc.
 
A function that returns the state of a light could be similarly named, but with a verb
 
at the front,
 
bool light::is_on() const;
 
bool tank::has_enough_fuel_for( trip const & trp ) const;
 
A function that turns a light on should be named like
 
void light::turn_on();
 
Or if a boolean variable '''controls''' the light, it could be named,
 
bool light_enable_switch;
 
There should NEVER be ambiguously named variables or functions like
 
light_on //BAD!!! Is this a switch that turns it on?, a status?, or what?
 
Least of all, something like
 
int light_on( double x, int y ); // '''HORROR'''
 
  
 
Another very good practice not mentioned is avoiding inheritance of concrete classes.
 
Another very good practice not mentioned is avoiding inheritance of concrete classes.
 
All hierarchies trees should be abstract, having a dtor like,
 
All hierarchies trees should be abstract, having a dtor like,
 
  virtual ~aClass() = 0;
 
  virtual ~aClass() = 0;
with only the leafs being concrete. Adhering to this discipline avoids many dangers
+
with only the leafs being concrete.  
 +
 
 +
Apart from the known benefits of dependency inversion, adhering to this discipline  
 +
avoids many dangers
 
associated with automatically generated assignment operators and constructors that
 
associated with automatically generated assignment operators and constructors that
 
can lead to slicing or simply non-sensical assignment across inheritance levels. It
 
can lead to slicing or simply non-sensical assignment across inheritance levels. It
Line 99: Line 131:
 
however; the idea is merely to think twice... make that three times.. before having
 
however; the idea is merely to think twice... make that three times.. before having
 
one concrete class inherit another concrete class. Avoid whenever possible.
 
one concrete class inherit another concrete class. Avoid whenever possible.
 +
 +
== Open-Close ==
  
 
Also, avoiding virtual functions in the public interface of classes. The preferred
 
Also, avoiding virtual functions in the public interface of classes. The preferred
 
method is to have public non-virtual, inline functions that call private virtual functions
 
method is to have public non-virtual, inline functions that call private virtual functions
  class unit
+
<source lang="cpp">
 +
  class Unit
 
  {
 
  {
 
  public:
 
  public:
   void fly(){ _fly(); }
+
   void fly(){ flyImpl(); }
 
  private:
 
  private:
   virtual void _fly();
+
   virtual void flyImpl();
 
  };
 
  };
 +
</source>
 
Exceptionally, a virtual function could be protected:
 
Exceptionally, a virtual function could be protected:
  class unit
+
<source lang="cpp">
 +
  class Unit
 
  {
 
  {
 
  public:
 
  public:
   void fly(){ _fly(); }
+
   void fly(){ flyImpl(); }
 
  protected:
 
  protected:
   virtual void _fly();
+
   virtual void flyImpl();
 
  };
 
  };
The semantic difference here is that if you make _fly() private, you are communicating
+
</source>
to other programmers that derived class overrides of _fly() need not call nor know
+
The semantic difference here is that if you make flyImpl() private, you are communicating
anything about the base class' _fly() function.
+
to other programmers that derived class overrides of flyImpl() need not call nor know
 +
anything about the base class' flyImpl() function.
 
In the latter case, you are communicating to programmers that derived class overrides of
 
In the latter case, you are communicating to programmers that derived class overrides of
_fly() SHOULD call the base-class _fly() function; that is, that they should add to, or
+
flyImpl() SHOULD call the base-class flyImpl() function; that is, that they should add to, or
 
"decorate" the original function, rather than replace it.
 
"decorate" the original function, rather than replace it.
 
But if you make fly() a public virtual function, you're communicating to other programmers that
 
But if you make fly() a public virtual function, you're communicating to other programmers that
you just don't know the language and its idioms very well ;-)
+
you just don't know the language and its idioms very well ;-), you're mixing different interface
 +
levels (interface of client code vs. interface of extension code), and breeching the open-close
 +
principle we so love.
 +
 
 +
One thing to take extra care when defining flyImpl() as private is the confusion that will arise
 +
if further derived classes want to stack flyImpl() ''"decorations"'', ie
 +
<source lang="cpp">
 +
class Unit
 +
{
 +
public:
 +
    void fly();
 +
private:
 +
    virtual void flyImpl();
 +
}
 +
 +
class HoverUnit : Unit
 +
{
 +
private:
 +
    virtual void flyImpl() { ... }
 +
}
 +
 
 +
class ShortHoverUnit : HoverUnit
 +
{
 +
private:
 +
    // WTF? Can't do this, have to copy HoverUnit's implementation... bad...
 +
    virtual void flyImpl() { if (not tooLong) HoverUnit.flyImpl(); }
 +
}
 +
</source>
 +
 
 +
So in fact it's probably safer to always declare them protected, unless you really really
 +
know you won't need the above (decorator-like) pattern.
 +
 
 +
 
 +
== Data structures vs. Object classes ==
  
 
Separating "data" from "code" or "behavioral" classes is also a good idea. Classes representing passive data
 
Separating "data" from "code" or "behavioral" classes is also a good idea. Classes representing passive data
Line 138: Line 209:
 
implies that their type definitions must also be included, which calls for additional
 
implies that their type definitions must also be included, which calls for additional
 
header files, and so on, which can often lead to vicious header inclusion circles.
 
header files, and so on, which can often lead to vicious header inclusion circles.
 +
 +
 +
= More on names =
 +
 +
== Descriptive identifiers ==
 +
 +
Another subject not fully addressed is variable and function names, and how to make them
 +
intuitive and clear. For example, boolean variables should always read like statements
 +
that can only be true or false. E.g,
 +
<source lang="cpp">
 +
bool lightIsOn;
 +
bool tankHasEnoughFuel;
 +
</source>
 +
etc.
 +
A function that returns the state of a light could be similarly named, but with a verb
 +
at the front,
 +
<source lang="cpp">
 +
bool light::isOn() const;
 +
bool tank::hasEnoughFuelFor( trip const & trp ) const;
 +
</source>
 +
A function that turns a light on should be named like
 +
<source lang="cpp">
 +
void light::turnOn();
 +
</source>
 +
Or if a boolean variable '''controls''' the light, it could be named,
 +
<source lang="cpp">
 +
bool lightEnableSwitch;
 +
</source>
 +
There should NEVER be ambiguously named variables or functions like
 +
  <source lang="cpp">
 +
lightOn //BAD!!! Is this a switch that turns it on?, a status?, or what?
 +
  </source>
 +
Least of all, something like
 +
<source lang="cpp">
 +
int lightOn( double x, int y ); // '''HORROR'''
 +
</source>
 +
 +
 +
== Global namespace ==
  
 
Global space and non-class-function calls should be prefixed :: to make it clear that they
 
Global space and non-class-function calls should be prefixed :: to make it clear that they
 
are not calls to member functions. This would include functions from static and dynamic
 
are not calls to member functions. This would include functions from static and dynamic
 
libraries. Thus, rather than a = floor( b ), write a = ::floor( b ).
 
libraries. Thus, rather than a = floor( b ), write a = ::floor( b ).
 +
 +
 +
 +
== Command Macros' "do{...}while(0)" wrappers ==
 +
By "command macros" it is meant macros that don't "return" a value; but rather "do" something.
 +
Such macros' definitions should be wrapped in a do{ ... }while(0) (YES; NO semicolon at the end) loop.
 +
 +
SUPER-Bad example, in many ways:
 +
  <source lang="cpp">
 +
  #define lights_off lights=0; //HORROR
 +
</source>
 +
First of all, it is customary to spell macros all-capitals, to remind us that they are macros. Secondly, the name is ambiguous; fails to make clear whether it is a command or a query; and unless a macro's purpose is to re-present an identifier (a variable or constant), it should have function-like parenthesis. Thirdly, there should never be closing semicolons in macros. Fourthly is the present subject matter...
 +
 +
Good example:
 +
  <source lang="cpp">
 +
  #define TURN_LIGHTS_OFF() do{ lights = 0; }while(0)
 +
</source>
 +
 +
Why? Many reasons, from mundane to philosophical; but two that stand out at a practical level: a) Without the
 +
do{...}while(0) wrapper, you could forget to put a semicolon after a macro instance, and the
 +
code might compile, but might do weird things. With the wrapper (notably NOT including a closing semicolon), forgetting a semicolon
 +
after a macro call is guaranteed not to compile. b) Without the do{...}while(0) wrapper, the macro might unintentionally
 +
have a "return value" and compile ok if mistakenly put on the RHS of an assignment,
 +
With the wrapper, the macro will never "return" a value; and placing it as the RHS of an assignment or other operator will
 +
correctly result in a compile error.
 +
 +
= Design by contract =
  
 
Another very helpful standard to adopt is DBC (Design By Contract). DBC cannot be done
 
Another very helpful standard to adopt is DBC (Design By Contract). DBC cannot be done
Line 147: Line 284:
 
a private, inline function void _chk_invariants() that is defined as empty in release mode but
 
a private, inline function void _chk_invariants() that is defined as empty in release mode but
 
has code in debug mode. The debug mode code checks the state of the class for coherence.
 
has code in debug mode. The debug mode code checks the state of the class for coherence.
  class myString
+
<source lang="cpp">
 +
  class ShortString
 
  {
 
  {
 
   char str[32];
 
   char str[32];
Line 162: Line 300:
 
   }
 
   }
 
  };
 
  };
Then, all class functions can have calls to _chk_invariants(): at the starts of queries and
+
</source>
 +
Then, all class functions... Pardon me! All PUBLIC class functions can have calls to _chk_invariants(): at the starts of queries and
 
at the ends of commands. Thus, queries make sure that nothing's wrong with the class' state
 
at the ends of commands. Thus, queries make sure that nothing's wrong with the class' state
 
before answering the query; and commands make sure that they leave the class in a coherent
 
before answering the query; and commands make sure that they leave the class in a coherent
 
state after they execute.
 
state after they execute.
 +
 +
NOTE: Private and protected functions are NOT required to check invariants, and should NOT do so, --as a public function may call many private functions
 +
to accomplish a task, but only upon completion of the task would the invariants be expected to be met.
 +
 
Additionally, every function should begin with assertions about its input parameters, to
 
Additionally, every function should begin with assertions about its input parameters, to
 
make sure the input they get is within specified range; that pointers are not null, etc.
 
make sure the input they get is within specified range; that pointers are not null, etc.
Line 178: Line 321:
 
written with input checks from the beginning, debugging becomes much easier, as the consequences
 
written with input checks from the beginning, debugging becomes much easier, as the consequences
 
of bugs tend to be caught much sooner and closer to the source of the problem.
 
of bugs tend to be caught much sooner and closer to the source of the problem.
 +
 +
Macros can help a lot with making this look more formalized. Consider the following code:
 +
 +
<source lang="cpp">
 +
 +
#ifdef _NDEBUG
 +
 +
#define INVARIANT(condition)
 +
#define INVARIANT_GROUP(name, ...)
 +
#define NAMED_INVARIANT(name, condition)
 +
#define INVARIANTS(...)
 +
#define CHECK_INVARIANTS
 +
#define CHECK_INVARIANTS_FOR(instance)
 +
 +
#else
 +
 +
#define INVARIANT(condition) if (af == 0) af = (condition) ? 0 : #condition; \
 +
 +
#define INVARIANT_GROUP(name, ...) { \
 +
    const char *af = 0 ; \
 +
    __VA_ARGS__ ; \
 +
    if (af != 0) { \
 +
        fprintf(stderr, "Assertion failure (%s): at %s:%d %s\n", \
 +
            name, __FILE__, __LINE__, af); \
 +
        abort(); \
 +
    } \
 +
}
 +
 +
#define NAMED_INVARIANT(name, condition) INVARIANT_GROUP(name, INVARIANT(condition))
 +
 +
#define INVARIANTS(...) \
 +
    void __check_invariants() const \
 +
    { \
 +
        __VA_ARGS__ ; \
 +
    }
 +
 +
#define CHECK_INVARIANTS() do{ __check_invariants(); }while(0)
 +
 +
#define CHECK_INVARIANTS_FOR(instance) instance.CHECK_INVARIANTS()
 +
 +
#endif
 +
 +
class ShortString
 +
{
 +
    // ... private stuff ...
 +
    class Impl; // <-- not sure it works ;)
 +
    auto_ptr<Impl> pImpl;
 +
 +
public:
 +
    INVARIANTS(
 +
        NAMED_INVARIANT( "string is short", getLength() < 32 ),
 +
        NAMED_INVARIANT( "length is nonnegative", getLength() >= 0 )
 +
    )
 +
 +
    bool tellmeSomething(...) const
 +
    {
 +
        CHECK_INVARIANTS();
 +
        ...
 +
        return ...
 +
    }
 +
    void doSomething(...)
 +
    {
 +
        CHECK_INVARIANTS(); //perhaps..
 +
        ...
 +
        CHECK_INVARIANTS(); //definitely
 +
    }
 +
}
 +
</source>
 +
 +
This only requires C99 compliance (gcc 3 and above), and looks cool.
 +
 +
Notice that this describes interface-level invariants, hence the declarative interface, so that
 +
readers of the interface definition have the invariants in a human-friendly notation.
 +
The ''pimpl'' idiom allows implementation invariants to be specified for the pimpl class as well,
 +
so everything is neatly abstracted away.
 +
 +
 +
== Assertions and exceptions ==
  
 
Avoid confusing code errors and debugging with run-time exceptions. These are two totally
 
Avoid confusing code errors and debugging with run-time exceptions. These are two totally
 
different paradigms and should not be mixed. Checking for possible consequences of code
 
different paradigms and should not be mixed. Checking for possible consequences of code
 
error is what debugging is all about; and the standard way to deal with code errors is by
 
error is what debugging is all about; and the standard way to deal with code errors is by
assert()'s. The difference between code errors and exceptions is that code errors are bad things that "should
+
assert()'s.  
 +
 
 +
The difference between code errors and exceptions is that code errors are bad things that "should
 
NOT happen", whereas run-time exceptions are bad things that one foresees WILL happen, such as the user
 
NOT happen", whereas run-time exceptions are bad things that one foresees WILL happen, such as the user
 
having removed a CD that the app needed. The standard way to deal with run-time exceptions is
 
having removed a CD that the app needed. The standard way to deal with run-time exceptions is
 
to throw( new disk_not_found() ); then catch the exception at some convenient place up the
 
to throw( new disk_not_found() ); then catch the exception at some convenient place up the
 
stack, and deal with it. Avoid using exceptions for debugging; or asserts for run-time exceptions.
 
stack, and deal with it. Avoid using exceptions for debugging; or asserts for run-time exceptions.
 +
 
There's admittedly a gray area, however: Code is never perfect; and sometimes it may be
 
There's admittedly a gray area, however: Code is never perfect; and sometimes it may be
 
desirable to deal with a code error as an exception rather than to terminate an application; but
 
desirable to deal with a code error as an exception rather than to terminate an application; but
 
this should be rarely done; only for types of bugs that are known but seemingly intractable; and
 
this should be rarely done; only for types of bugs that are known but seemingly intractable; and
 
should be documented as such.
 
should be documented as such.
 +
 
Also, error-logging tends to straddle both worlds; but, by the same token, it may be preferable
 
Also, error-logging tends to straddle both worlds; but, by the same token, it may be preferable
 
to have separate files for debugging and exception logs. Just having __ASSERT__ and __THROW__
 
to have separate files for debugging and exception logs. Just having __ASSERT__ and __THROW__
Line 197: Line 422:
 
add extra info, such as __FILE__ and line number for debug mode exception logs.
 
add extra info, such as __FILE__ and line number for debug mode exception logs.
  
High Performance Coding: It's been said enough, perhaps too much, that code optimization is
+
A useful guideline is that exception handling is best used to handle recoverable
 +
exceptional conditions. Throwing an exception means: at this point in time I don't know what to do:
 +
I'll throw an exception and let some higher-level code, which should know what to
 +
do, handle the situation.
 +
 
 +
Think about compiling shaders: the operation may fail for unsupported
 +
shader profiles. Does the ''"compile_shader"'' function know what to do?
 +
Nope... it '''should''' throw an exception and let some higher level
 +
code figure out what to do. Like fall back to a simpler shader. Or log
 +
an error and abort.
 +
 
 +
= Performance =
 +
 
 +
'''High Performance Coding''': It's been said enough, perhaps too much, that code optimization is
 
the last step. I'll just say a couple of things here on the other side of the coin: Sometimes
 
the last step. I'll just say a couple of things here on the other side of the coin: Sometimes
 
it pays to consider some aspects of optimization up-front. Converting non-multi-threaded code
 
it pays to consider some aspects of optimization up-front. Converting non-multi-threaded code
Line 227: Line 465:
 
won't even touch your data structures, and it is entirely the responsibility of the programmer to
 
won't even touch your data structures, and it is entirely the responsibility of the programmer to
 
optimize data structures for minimum size, --alignment considerations notwithstanding.
 
optimize data structures for minimum size, --alignment considerations notwithstanding.
 +
 +
Another refrain in code optimization always reminds us of something about QuickSort versus
 +
BubbleSort; I forget what it was, now. Indeed, BubbleSort can be 100's of times faster than
 +
QuickSort. Don't believe me? Put it this way: Quick Sort boasts O(N*logN), versus O(N^2) for
 +
bubble; BUT, the dice are loaded in those tests. The data set is typicall totally randomized. That
 +
puts QuickSort in the best light, and BubbleSort in the worst. If our data were already sorted, we'd
 +
be catching QuickSort at its worst (still taking O(N*logN) time to do nothing at all), while
 +
catching BubbleSort at its best (one quick pass, or O(N), and it exits). Even if the data is
 +
only roughly pre-sorted, but we can guarantee sorting in a small number of BubbleSort passes,
 +
the performance is still officially O(N), and BubbleSort wins. And not only that, but a pass
 +
of BubbleSort also wins the cache coherence benchmark: A linear progression is easy to predict
 +
by the automatic prefetching mechanisms of modern CPU's, whereas QuickSort jumps all over the
 +
place, resulting in cache misses galore. In a 3D engine, a lot of the sorting needs are indeed
 +
incremental, working with pre-sorted data. Suppose you keep an index of objects in a scene
 +
sorted by distance to the camera. The first time you order the objects, QuickSort is the thing
 +
to use; but on subsequent 'frames', the index is already 99.9% sorted; any two objects in it
 +
would swap positions once every so many frames. Thus, a pass of BubbleSort --occasionally two--
 +
are sufficient to restore the sorted status of the index, at a fraction of the processing
 +
cost. Conclusion: Use QuickSort where you expect the data to be highly
 +
random; but choose BubbleSort otherwise.
 +
 +
== Use the STL ==
 +
 +
STL uses introsort, which is an insertion sort for small fragments,
 +
and a quicksort for other. It's said to be provably O(n log n), though
 +
I didn't see such proof (it doesn't mean the assertion is false
 +
though). Also, if you require stable performance and memory efficiency
 +
while sorting, using the STL's ''heapify'' and ''heap_pop'' is a very
 +
simple way to perform a heapsort in ''o(n + n log n)''.
 +
 +
Python uses another sorting method, a very very sophisticated one,
 +
that has a certain overhead over STL's introsort but can perform O(n)
 +
for presorted (or roughly presorted) sequences... timsort they called
 +
it. I believe it merits an STL-like implementation.
 +
 +
In any case the morale is: '''use the libraries'''. Many problems
 +
have been solved by the STL and Boost, in performant and generic
 +
ways. When the STL and/or Boost don't cut it, it's a simple matter
 +
of extending it in STL-like and/or Boost-like fashion, rather than
 +
coding tons of special-purpose code.
 +
 +
= Use the type system =
 +
 +
I can't think of a better way of titling this section than..
 +
'''USE THE TYPE SYSTEM'''
 +
What I'm talking about is increasing code safety and cleanliness by using types for
 +
things like measures and identifiers.
 +
Say, for instance, you have a system of ''factions'', which have a name (string) and ID numbers (ints). First problem with that is you might end up duplicating gazillions of functions, for the sake of accepting string or IDnum identifiers (void updateRelation( string faction1, string faction2 ); void updateRelationInt( int faction1, int faction2 ); etc.). Second problem is that if you use integers as identifiers for groups or crowds, and for cities, and for door-frames, you could conceivably write code that subtracts a group from a faction, adds a city, and assigns the result to a door-frame; and the compiler would compile and link that happily. (Some might think of this as "Power"...).
 +
Instead, use a class for Faction (excuse the size of the example and the simplified pseudo-code...(and I hope someone doesn't think I'm suggesting that all functions should be defined inline in the class declaration; I'm doing so in hopes to make the example clearer and not make you go different places to look up the functions' code...)):
 +
  <source lang="cpp">
 +
  class Faction
 +
  {
 +
    using namespace std;
 +
    typedef unsigned short ID_t; //16 bits should suffice for faction ID's?
 +
    ID_t ID;
 +
    ...
 +
    //ID_generator should be a singleton, really; just trying to simplify the example
 +
    class ID_generator : private MultiServerSynchronizedIDGen
 +
    {
 +
      hash_map< string, ID_t > StoID_table; //local cache duplicating server db
 +
      array< string > IDtoS_table; //local cache duplicating server db
 +
    public:
 +
      ID_t getNewID( string aNewName ) //exceptional violation of command/query
 +
      {
 +
        ID_t id = MultiServerSynchronizedIDGen( aNewName );
 +
        StoID_table.insert( aNewName, id );
 +
        IDtoS_table[ID] = ( aNewName );
 +
      }
 +
      string const & stringFromID( ID_t id ) const { assert( id ); return IDtoS_table[ID]; }
 +
      ID_t IDfromString( string const & s ) const { return StoID_table.find(s).second(); } //or whatever
 +
    };
 +
    static ID_generator IDgen; //should be a singleton, rather than static
 +
  public:
 +
    Faction() : ID(0) {}
 +
    //compiler-generated copy ctor and assignment ok
 +
    bool is_initialized() const { return ID != 0 };
 +
    #ifndef NDEBUG
 +
      ID_t getID() const //ID's not needed by clients; for debugging only
 +
      {
 +
        assert( is_initialized );
 +
        return ID;
 +
      }
 +
    #endif
 +
    string const & getName() const
 +
    {
 +
      assert( is_initialized );
 +
      return IDgen.stringFromID( ID_t );
 +
    }
 +
    friend bool operator==( Faction const & fac1, Faction const & fac2 )
 +
    {
 +
      assert( fac1.is_initialized() );
 +
      assert( fac2.is_initialized() );
 +
      return fac1.ID == fac2.ID;
 +
    }
 +
    friend bool operator!=( Faction const & fac1, Faction const & fac2 )
 +
    {
 +
      return !fac1==fac2;
 +
    }
 +
    void createNewFaction( string const & aNewName ) //no-throw
 +
    {
 +
      //simplified pseudo-code:
 +
      //ensure name is usable
 +
      assert( aNewName.len() > 0 );
 +
      //object should be UN-initialized
 +
      assert( ID == 0 );
 +
      //ensure name doesn't already exist
 +
      assert( IDgen.IDfromString( aNewName ) == 0 );
 +
      //do it!
 +
      try
 +
      {
 +
        ID = IDgen.getNewID( aNewName );
 +
      }
 +
      catch(...)
 +
      {
 +
        ... //resolve exceptions, as this function has to be no-throw
 +
      }
 +
      assert( ID != 0 );
 +
    }
 +
    explicit Faction( string const & aName )
 +
    {
 +
      assert( aName.len() > 0 );
 +
      ID = IDgen.IDfromString( aName );
 +
      if( ID == 0 )
 +
        ID = createNewFaction( aName );
 +
      assert( ID != 0 );
 +
    }
 +
  };
 +
  </source>
 +
Now, you can use Faction objects in place of names or ID's everywhere a faction needs to be identified; and if you try to add a faction to a group and subtract that from a door-frame, the compiler will refuse. In fact; you can't even add a Faction to a Faction, because the + and += operators are not defined; --only those operators that are needed are defined (== and !=).
 +
Furthermore, all the complexities of strings and number identifiers management are hidden from clients.
 +
Faction relations could also be made part of the class, with a new, static hash-table that uses <ID_t,ID_t> pairs as keys.
 +
Also, the Faction class above wouldn't need to be PIMPL'd: it's size is only 2 bytes (16 bits).
 +
 +
Let all conversion constructors be explicit. Let there be no conversion operators.
 +
 +
A similar philosophy can be applied to measures (short of using a real SI-units package): Instead of passing around lengths and distances as floats, it is MUCH safer AND clearer to have like,
 +
  <source lang="cpp">
 +
  template <typename T> //representation --e.g. float, double...
 +
  class meters
 +
  {
 +
    T num;
 +
  public:
 +
    meters() : num(0.0) {}
 +
    explicit meters( T anum ) : num(anum) {}
 +
    template <typename U>
 +
    explicit meters( U anum ) : num(anum) {}
 +
    //only needed operators:
 +
    .......
 +
  };
 +
  template meters<float>;
 +
  template meters<double>; //or however explicit instantiation was written; I forget
 +
  ............
 +
  ............
 +
  ............
 +
  #include meters.h
 +
  #include seconds.h
 +
  #include m_per_s.h
 +
  #include number.h
 +
  ............
 +
  template< typename T >
 +
  number<T> ::operator/( meters<T> m1, meters<T> m2 )
 +
  {
 +
    assert( m2 != 0.0 );
 +
    return m1/m2;
 +
  }
 +
  ............
 +
  template< typename T >
 +
  mps<T> ::operator/( meters<T> m, seconds<T> s )
 +
  {
 +
    assert( s != 0.0 );
 +
    return m/s;
 +
  }
 +
  </source>
 +
Easy to code. Much easier to use than SI-unit packages; --though very limited. But enormously useful in terms
 +
of bug-catching and code clarity.
 +
This is what types were invented for. There's no justification to be passing around raw ints and floats.
 +
 +
A further refinement to making units into data-types, would be to separate absolute from relative measures.
 +
We could leave the meters<T> class above abstract, and derive two concrete classes: meters_position and meters_distance.
 +
Although a position IS a distance (from an origin), the distinction has practical usefulness.
 +
You can define only those operators that make sense, then. Adding a position to another makes no sense, for example, so
 +
the operator,
 +
  <source lang="cpp">
 +
  template< typename T >
 +
  void operator+( meters_position<T>, meters_position<T> ){}
 +
  </source>
 +
returns nothing. Trying to assign its result results in a compile time error.
 +
But subtracting a position from another does make sense, and yields a distance. Thus,
 +
  <source lang="cpp">
 +
  template< typename T >
 +
  meters_distance<T> operator-( meters_position<T> mp1, meters_position<T> mp2 )
 +
  {
 +
    return mp1.val - mp2.val;
 +
  }
 +
  </source>
 +
Similarly,
 +
  <source lang="cpp">
 +
  //a distance minus a distance is a distance
 +
  template< typename T >
 +
  meters_distance<T> operator-( meters_distance<T> md1, meters_distance<T> md2 )
 +
  {
 +
    return md1.val - md2.val;
 +
  }
 +
  //subtracting a distance from a position yields a new position
 +
  template< typename T >
 +
  meters_position<T> operator-( meters_position<T> mp1, meters_distance<T> md2 )
 +
  {
 +
    return mp1.val - md2.val;
 +
  }
 +
  //subtracting a position from a distance makes no sense
 +
  template< typename T >
 +
  void operator-( meters_distance<T>, meters_position<T> ){}
 +
  </source>
 +
Same principle would apply to time-spans (time distances) versus date-times (time-positions).
 +
The Standard Library uses this technique for pointers: There's a type ptr_t and another type ptrdiff_t.
 +
Also note that cross-units operators, such as division of meters by seconds yielding mps, would
 +
only work with meters_distance<T> and seconds_span<T>; NOT with meters_postion<T>, etc.
 +
 +
One gottcha in the "absolute" measures as implemented above is that they assume a single frame of reference. This could be a serious issue for Sirikata, which if I understand correctly, uses multiple coordinate systems. One way to deal with this would be to use an additional template parameter for absolute measures to reference a coordinate system. To put it all together once again,
 +
  <source lang="cpp">
 +
  //ABSTRACT:
 +
  template <typename T>
 +
  class meters
 +
  {
 +
    T val;
 +
  public:
 +
    meters() : val(0.0) {}
 +
    explicit meters( T aval ) : val(aval) {}
 +
    template <typename U>
 +
    explicit meters( U aval ) : val(aval) {}
 +
    ~meters() = 0;
 +
    .......
 +
  };
 +
  //RELATIVE:
 +
  template <typename T>
 +
  class meters_distance : private meters<T> //do NOT inherit further
 +
  {
 +
  public:
 +
    meters_distance() : meters() {}
 +
    explicit meters_distance( T aval ) : meters(aval) {}
 +
    template <typename U>
 +
    explicit meters_distance( U aval ) : meters(aval) {}
 +
    ~meters() = 0;
 +
    ~meters_distance(){} //non-virtual; do NOT inherit further
 +
    .......
 +
  };
 +
  //ABSOLUTE:
 +
  template <typename T, typename C>
 +
  class meters_position : private meters<T> //do NOT inherit further
 +
  {
 +
  public:
 +
    meters_position() : meters() {}
 +
    explicit meters_position( T aval ) : meters(aval) {}
 +
    template <typename U>
 +
    explicit meters_position( U aval ) : meters(aval) {}
 +
    ~meters() = 0;
 +
    ~meters_position(){} //non-virtual; do NOT inherit further
 +
    .......
 +
  };
 +
  ..................................
 +
  //ADDING AND SUBTRACTING MIXED ABSOLUTE AND RELATIVE MEASURES
 +
  //functions below must be declared friend of classes above
 +
  //positions can't be added; function returns void
 +
  template <typename T, typename C>
 +
  void operator+( meters_position<T,C>, meters_position<T,C> ){}
 +
  //subtracting two positions yields a distance
 +
  template <typename T, typename C>
 +
  meters_distance<T> operator-( meters_position<T,C> p1, meters_position<T,C> p2 )
 +
  {
 +
    return meters_distance<T>( p1.val - p2.val );
 +
  }
 +
  //adding a distance to a position yields a new position
 +
  template <typename T, typename C>
 +
  meters_position<T,C> operator+( meters_position<T,C> p, meters_distance<T> d )
 +
  {
 +
    return meters_position<T,C>( p.val + d.val );
 +
  }
 +
  //subtracting a distance from a position yields a new position
 +
  template <typename T, typename C>
 +
  meters_position<T,C> operator-( meters_position<T,C> p, meters_distance<T> d )
 +
  {
 +
    return meters_position<T,C>( p.val - d.val );
 +
  }
 +
  //adding a position to a distance would be legal and equivalent to adding a
 +
  //a distance to a position; but it is an un-intuitive way of writing it, so
 +
  //we disallow it
 +
  template <typename T, typename C>
 +
  void operator+( meters_distance<T>, meters_position<T,C> ){}
 +
  //subtracting a position from a distance makes no sense at all; disabled
 +
  template <typename T, typename C>
 +
  void operator-( meters_distance<T>, meters_position<T,C> ){}
 +
  //adding and subtracting distances yield distances
 +
  template <typename T>
 +
  meters_distance<T> operator+( meters_distance<T> d1, meters_distance<T> d2 )
 +
  {
 +
    return meters_distance<T>( d1.val + d2.val );
 +
  }
 +
  template <typename T>
 +
  meters_distance<T> operator-( meters_distance<T> d1, meters_distance<T> d2 )
 +
  {
 +
    return meters_distance<T>( d1.val - d2.val );
 +
  }
 +
  ..................................
 +
  //CROSS-CONTINUUM OPERATORS ONLY USE RELATIVE MEASURES
 +
  #include meters.h
 +
  #include seconds.h
 +
  #include mps.h
 +
  #include number.h
 +
  ............
 +
  template< typename T >
 +
  number<T> ::operator/( meters_distance<T> m1, meters_distance<T> m2 )
 +
  {
 +
    assert( m2 != 0.0 );
 +
    return m1/m2;
 +
  }
 +
  ............
 +
  template< typename T >
 +
  m_per_s<T> ::operator/( meters_distance<T> m, seconds_span<T> s )
 +
  {
 +
    assert( s != 0.0 );
 +
    return m/s;
 +
  }
 +
  </source>
 +
Note also that now we can define "absolute vector3D" (position3D) and "relative vector3D" (vector3D) as having members, respectively, meters_position<T> and meters_distance<T>
 +
  <source lang="cpp">
 +
  template <typename T, typename C>
 +
  class position3D //meters, where C is the "Coordinate system" or "Continuum"
 +
  {
 +
    meters_position<T,C> x,y,z;
 +
  public:
 +
    .............
 +
  };
 +
  template <typename T>
 +
  class vector3D //meters
 +
  {
 +
    meters_distance<T> x,y,z;
 +
  public:
 +
    .............
 +
  };
 +
  </source>
 +
And the same rules apply when adding/subtracting mixed absolute and relative:
 +
  <source lang="cpp">
 +
  //positions can't be added; function returns void
 +
  template <typename T, typename C>
 +
  void operator+( position3D<T,C>, position3D<T,C> ){}
 +
  //subtracting two positions yields a distance
 +
  template <typename T, typename C>
 +
  vector3D<T> operator-( position3D<T,C> p1, position3D<T,C> p2 )
 +
  {
 +
    return vector3D<T>( p1.x-p2.x, p1.y-p2.y, p1.z-p2.z );
 +
  }
 +
  //adding a distance to a position yields a new position
 +
  template <typename T, typename C>
 +
  position3D<T,C> operator+( position3D<T,C> p, vector3D<T> v )
 +
  {
 +
    return position3D<T,C>( p.x+v.x, p.y+v.y, p.z+v.z );
 +
  }
 +
  //subtracting a distance from a position yields a new position
 +
  template <typename T, typename C>
 +
  position3D<T,C> operator-( position3D<T,C> p, vector3D<T> v )
 +
  {
 +
    return position3D<T,C>( p.x-v.x, p.y-v.y, p.z-v.z );
 +
  }
 +
  //adding a position to a distance would be legal and equivalent to adding a
 +
  //a distance to a position; but it is an un-intuitive way of writing it, so
 +
  //we disallow it
 +
  template <typename T, typename C>
 +
  void operator+( vector3D<T>, position3D<T,C> ){}
 +
  //subtracting a position from a distance makes no sense at all; disabled
 +
  template <typename T, typename C>
 +
  void operator-( vector3D<T>, position3D<T,C> ){}
 +
  //adding and subtracting distances yield distances
 +
  template <typename T>
 +
  vector3D<T> operator+( vector3D<T> v1, vector3D<T> v2 )
 +
  {
 +
    return vector3D<T>( v1.x+v2.x, v1.y+v2.y, v1.z+v2.z );
 +
  }
 +
  template <typename T>
 +
  vector3D<T> operator-( vector3D<T> v1, vector3D<T> v2 )
 +
  {
 +
    return vector3D<T>( v1.x-v2.x, v1.y-v2.y, v1.z-v2.z );
 +
  }
 +
  </source>
 +
Well, more practically, we'd first implement += and -= operators as class functions; and later define external functions that use the += and -= operators. This avoids the necessity of declaring the + and - operators as friend-s. So the code above is just to illustrate the principle of absolute versus relative measures.

Latest revision as of 05:03, 19 December 2009

Patterns

Const correctness

There is no mention of const-correctness in the standards page. (Sorry, there IS; I just missed it on first read.) Where it talks about the use of get/set prefixes, the example given is incomplete. It says to write,

 int getAge();
 void setAge( int Age );

It should say to write

 int getAge() const; //<--
 void setAge( int anAge );

(Besides, someone might expect Age to be a float, or an unsigned, or a size_t, or a special class with years and months and days... There's probably something to be said for choosing a name that better reflects the underlying representation; --NOT talking about "iAge"; rather a name like "YearsOfAge", and its type to be a size_t, unless it can be negative.)

Const-correctness essentially boils down to making const anything that CAN be const. Sometimes the constness of functions is made difficult by intrinsic linkage, such as when objects have a pointer to next, and are part of an LRU cache that should update on reads. Well, such situations call for making that pointer to next mutable; --after all, it is not really a part of the class, philosophically speaking--, rather than compromise constness on the class interface.

Furthermore, const correctness applies more to software design discipline rather than peformance as the main article leads one to believe. Although compilers are probably free to use const modifiers for optimization, the advent of full program optimization makes this less of an issue (compilers optimizing at link time will already know a function has no side effects most of the time), its principal rationale is to formalize a fundamental interface aspect of the specification: that we promise this function will not, not now and not ever (until we break the interface that is), generate observable side effects. Const correctness also allow the compiler to enforce this for us, and therefore catch programming errors earlier.

Command/query separation

Another subject not mentioned is command/query separation. This is an enormoulsy helpful discipline that makes for clean and readable code when adhered to religiously. Command/Query separation says that there are two types of functions that should never be mixed or combined in one: Queries return a value, ALWAYS; and as class member functions are always ...() const. Commands are NEVER const, and ALWAYS return void. And what people new to this concept ALWAYS ask is "what about returning a success or failure status?". Answer is NEVER do that. Failures should be "returned" by throw()- -ing an exception. Status returns have been depracated for years now.

Another subject not addressed is, in fact, exceptions, and the huge importance of marking functions that do NOT throw excetions simply with a comment

 void f() // no-throw

Why? Because the big gottcha with exceptions is that you DON'T want them to happen inside a constructor; so by flagging non-throwing functions //no-throw you can, at a glance, tell if you can call it from a ctor.

Implicit constructors

Another subject not addressed is the use of "explicit" for single arg ctor's.

Single arg constructors are considered differently by C++, when the semantics of the special constructor is not the intended semantics, the safe way to define it is by flagging it with the explicit keyword:

 class List 
 {
     // Wrong, it will try to use this ctor to cast a size_t into a list
     List(size_t size);
 
     // Right, people will have to explicitly invoke this ctor
     explicit List(size_t size);
 }

Formal special values

Another subject not addressed is avoiding like the plague the terrible practice of using special values with floats to convey additional state (informal special values). This bad practice is a time-bomb, and a documentation nightmare. Someone modifying a piece of code using one such variable cannot be expected to look up the declaration of every variable to scan visually for comments about them; so a variable that can have special values meaning special things should be accompanied by comments about these special values and their meanings everywhere they are used!!! Just pack a bool with the float into a struct, like

 struct FuelCapacity
 {
    float litres;
    bool notApplicable; // this is a station dogbreath aint got no fuel tank
 };

rather than signify no fuel use by a -1. value. Or if you are really keen on the efficiency aspect, at least wrap that float into a class, like

 class FuelCapacity
 {
   float litres;
 public:
   bool isApplicable() const { return litres >= 0.0f );
   float getCapacity() const
   {
     assert( isApplicable() );
     return litres;
   }
   ...
 };

Avoiding Concrete Inheritance

Another very good practice not mentioned is avoiding inheritance of concrete classes. All hierarchies trees should be abstract, having a dtor like,

virtual ~aClass() = 0;

with only the leafs being concrete.

Apart from the known benefits of dependency inversion, adhering to this discipline avoids many dangers associated with automatically generated assignment operators and constructors that can lead to slicing or simply non-sensical assignment across inheritance levels. It also allows you to make concrete class destructors non-virtual, if you know that a concrete class will never be inherited. There'd be many exceptions to this rule, however; the idea is merely to think twice... make that three times.. before having one concrete class inherit another concrete class. Avoid whenever possible.

Open-Close

Also, avoiding virtual functions in the public interface of classes. The preferred method is to have public non-virtual, inline functions that call private virtual functions

 class Unit
 {
 public:
   void fly(){ flyImpl(); }
 private:
   virtual void flyImpl();
 };

Exceptionally, a virtual function could be protected:

 class Unit
 {
 public:
   void fly(){ flyImpl(); }
 protected:
   virtual void flyImpl();
 };

The semantic difference here is that if you make flyImpl() private, you are communicating to other programmers that derived class overrides of flyImpl() need not call nor know anything about the base class' flyImpl() function. In the latter case, you are communicating to programmers that derived class overrides of flyImpl() SHOULD call the base-class flyImpl() function; that is, that they should add to, or "decorate" the original function, rather than replace it. But if you make fly() a public virtual function, you're communicating to other programmers that you just don't know the language and its idioms very well ;-), you're mixing different interface levels (interface of client code vs. interface of extension code), and breeching the open-close principle we so love.

One thing to take extra care when defining flyImpl() as private is the confusion that will arise if further derived classes want to stack flyImpl() "decorations", ie

 class Unit
 {
 public:
    void fly();
 private:
    virtual void flyImpl();
 }
 
 class HoverUnit : Unit
 {
 private:
    virtual void flyImpl() { ... }
 }

 class ShortHoverUnit : HoverUnit
 {
 private:
    // WTF? Can't do this, have to copy HoverUnit's implementation... bad...
    virtual void flyImpl() { if (not tooLong) HoverUnit.flyImpl(); }
 }

So in fact it's probably safer to always declare them protected, unless you really really know you won't need the above (decorator-like) pattern.


Data structures vs. Object classes

Separating "data" from "code" or "behavioral" classes is also a good idea. Classes representing passive data should feature a complete set of initializing constructors and assignment operators, as well as binary operators for comparisons and/or arithmetic if applicable, and explicit initializations to/from base classes; but no other functionality. Behavioral classes should often have a single data member being a pointer to a struct containing the "real" data members; --that is, they should often use the PIMPL idiom, and often should have their default assignment operator and constructor disabled by being declared private and left undefined. Functions in code classes access variables in the struct via the single pointer member. This practice avoids excessive header information exposure. Clients of a PIMPL only need information about its public functions, --not about its data members; and having data members in its main header implies that their type definitions must also be included, which calls for additional header files, and so on, which can often lead to vicious header inclusion circles.


More on names

Descriptive identifiers

Another subject not fully addressed is variable and function names, and how to make them intuitive and clear. For example, boolean variables should always read like statements that can only be true or false. E.g,

 bool lightIsOn; 
 bool tankHasEnoughFuel;

etc. A function that returns the state of a light could be similarly named, but with a verb at the front,

 bool light::isOn() const;
 bool tank::hasEnoughFuelFor( trip const & trp ) const;

A function that turns a light on should be named like

 void light::turnOn();

Or if a boolean variable controls the light, it could be named,

 bool lightEnableSwitch;

There should NEVER be ambiguously named variables or functions like

 lightOn //BAD!!! Is this a switch that turns it on?, a status?, or what?

Least of all, something like

 int lightOn( double x, int y ); // '''HORROR'''


Global namespace

Global space and non-class-function calls should be prefixed :: to make it clear that they are not calls to member functions. This would include functions from static and dynamic libraries. Thus, rather than a = floor( b ), write a = ::floor( b ).


Command Macros' "do{...}while(0)" wrappers

By "command macros" it is meant macros that don't "return" a value; but rather "do" something. Such macros' definitions should be wrapped in a do{ ... }while(0) (YES; NO semicolon at the end) loop.

SUPER-Bad example, in many ways:

  #define lights_off lights=0; //HORROR

First of all, it is customary to spell macros all-capitals, to remind us that they are macros. Secondly, the name is ambiguous; fails to make clear whether it is a command or a query; and unless a macro's purpose is to re-present an identifier (a variable or constant), it should have function-like parenthesis. Thirdly, there should never be closing semicolons in macros. Fourthly is the present subject matter...

Good example:

  #define TURN_LIGHTS_OFF() do{ lights = 0; }while(0)

Why? Many reasons, from mundane to philosophical; but two that stand out at a practical level: a) Without the do{...}while(0) wrapper, you could forget to put a semicolon after a macro instance, and the code might compile, but might do weird things. With the wrapper (notably NOT including a closing semicolon), forgetting a semicolon after a macro call is guaranteed not to compile. b) Without the do{...}while(0) wrapper, the macro might unintentionally have a "return value" and compile ok if mistakenly put on the RHS of an assignment, With the wrapper, the macro will never "return" a value; and placing it as the RHS of an assignment or other operator will correctly result in a compile error.

Design by contract

Another very helpful standard to adopt is DBC (Design By Contract). DBC cannot be done fully in C++. Long story. But at least a subset can be achieved. Every class should have a private, inline function void _chk_invariants() that is defined as empty in release mode but has code in debug mode. The debug mode code checks the state of the class for coherence.

 class ShortString
 {
   char str[32];
   size_t len;
 public:
   ...
 private:
   ...
   void _chk_invariants()
   {
     assert( len < 32 );
     assert( len >= 0 );
     assert( ::strlen(str) == len );
   }
 };

Then, all class functions... Pardon me! All PUBLIC class functions can have calls to _chk_invariants(): at the starts of queries and at the ends of commands. Thus, queries make sure that nothing's wrong with the class' state before answering the query; and commands make sure that they leave the class in a coherent state after they execute.

NOTE: Private and protected functions are NOT required to check invariants, and should NOT do so, --as a public function may call many private functions to accomplish a task, but only upon completion of the task would the invariants be expected to be met.

Additionally, every function should begin with assertions about its input parameters, to make sure the input they get is within specified range; that pointers are not null, etc. This falls far short of DBC as implemented in the Eiffel language; but it's a start. Note that input check asserts should NOT even look at private class or function variables; only input variables. Why? Because it would be an unfair contract to expect the caller to ensure anything about the state of the class, which it cannot see. Just for kicks, adding assertions before the ends of queries, ensuring that output is within specified range; that pointers returned are not null, etceteras, would be nice. In the case of commands, the call to _chk_invariants() is sufficient. When all classes are built from the ground up with coherence checks and their functions are written with input checks from the beginning, debugging becomes much easier, as the consequences of bugs tend to be caught much sooner and closer to the source of the problem.

Macros can help a lot with making this look more formalized. Consider the following code:

 #ifdef _NDEBUG

 #define INVARIANT(condition) 
 #define INVARIANT_GROUP(name, ...) 
 #define NAMED_INVARIANT(name, condition)
 #define INVARIANTS(...) 
 #define CHECK_INVARIANTS 
 #define CHECK_INVARIANTS_FOR(instance) 

 #else 

 #define INVARIANT(condition) if (af == 0) af = (condition) ? 0 : #condition; \

 #define INVARIANT_GROUP(name, ...) { \
     const char *af = 0 ; \
     __VA_ARGS__ ; \
     if (af != 0) { \
         fprintf(stderr, "Assertion failure (%s): at %s:%d %s\n", \
             name, __FILE__, __LINE__, af); \
         abort(); \
     } \
 }

 #define NAMED_INVARIANT(name, condition) INVARIANT_GROUP(name, INVARIANT(condition))
 
 #define INVARIANTS(...) \
     void __check_invariants() const \
     { \
         __VA_ARGS__ ; \
     }

 #define CHECK_INVARIANTS() do{ __check_invariants(); }while(0)

 #define CHECK_INVARIANTS_FOR(instance) instance.CHECK_INVARIANTS()

 #endif

 class ShortString
 {
    // ... private stuff ...
    class Impl; // <-- not sure it works ;)
    auto_ptr<Impl> pImpl;

 public:
    INVARIANTS(
        NAMED_INVARIANT( "string is short", getLength() < 32 ),
        NAMED_INVARIANT( "length is nonnegative", getLength() >= 0 )
    )

    bool tellmeSomething(...) const
    {
        CHECK_INVARIANTS();
        ...
        return ...
    }
    void doSomething(...)
    {
        CHECK_INVARIANTS(); //perhaps..
        ...
        CHECK_INVARIANTS(); //definitely
    }
 }

This only requires C99 compliance (gcc 3 and above), and looks cool.

Notice that this describes interface-level invariants, hence the declarative interface, so that readers of the interface definition have the invariants in a human-friendly notation. The pimpl idiom allows implementation invariants to be specified for the pimpl class as well, so everything is neatly abstracted away.


Assertions and exceptions

Avoid confusing code errors and debugging with run-time exceptions. These are two totally different paradigms and should not be mixed. Checking for possible consequences of code error is what debugging is all about; and the standard way to deal with code errors is by assert()'s.

The difference between code errors and exceptions is that code errors are bad things that "should NOT happen", whereas run-time exceptions are bad things that one foresees WILL happen, such as the user having removed a CD that the app needed. The standard way to deal with run-time exceptions is to throw( new disk_not_found() ); then catch the exception at some convenient place up the stack, and deal with it. Avoid using exceptions for debugging; or asserts for run-time exceptions.

There's admittedly a gray area, however: Code is never perfect; and sometimes it may be desirable to deal with a code error as an exception rather than to terminate an application; but this should be rarely done; only for types of bugs that are known but seemingly intractable; and should be documented as such.

Also, error-logging tends to straddle both worlds; but, by the same token, it may be preferable to have separate files for debugging and exception logs. Just having __ASSERT__ and __THROW__ macros that direct messages to two separate files may be a way to implement this separation. Of course, exception handling could include logging of messages; but a __THROW__ macro could add extra info, such as __FILE__ and line number for debug mode exception logs.

A useful guideline is that exception handling is best used to handle recoverable exceptional conditions. Throwing an exception means: at this point in time I don't know what to do: I'll throw an exception and let some higher-level code, which should know what to do, handle the situation.

Think about compiling shaders: the operation may fail for unsupported shader profiles. Does the "compile_shader" function know what to do? Nope... it should throw an exception and let some higher level code figure out what to do. Like fall back to a simpler shader. Or log an error and abort.

Performance

High Performance Coding: It's been said enough, perhaps too much, that code optimization is the last step. I'll just say a couple of things here on the other side of the coin: Sometimes it pays to consider some aspects of optimization up-front. Converting non-multi-threaded code to multi-threaded code is HARD, for example. It may be much easier to design a multi-threaded application multi-threaded from the start. Also, knowing from the beginning if a function is called 12 times per second or 12,000 times per second can give the coder a sense of perspective. AMD has a free performance analysis tool, for Linux and Windows, Code Analyst: http://developer.amd.com/cpu/CodeAnalyst/codeanalystlinux/Pages/default.aspx? It's easy to use, even without reading the documentation, just running it with default settings can give you useful information, such as numbers of times functions are called, which can then be written to a comment in the function header comments block, and periodically updated. One additional benefit of knowing which functions are called the most is that the code can then be refactored such that most of the fast code resides in one file or in a small number number of files. These files can then be flagged to the compiler for "optimize for performance". The rest of the code should be "optimized for size", in order to get the highest performance. This may sound bogus but isn't. Back in the 90's, memory was so much faster than processors that a friend of mine had built a dual processor computer with shared memory by time-slicing memory access. But nowadays the situation is completely reversed: Processors can execute hundreds of instructions... rather thousands, in the time it takes to read a variable from memory. What's saving us from performance hell is cache: L1 cache, L2 cache, L3 chache... The worst performance bottleneck is when data is NOT in any cache, and has to be fetched from memory. That's called a "cache miss". The problem with cache is that it's expensive, and therefore limited in size. The smaller the code, the greater the chances that a particular piece of it will be found in cache. Therefore, for 90%, or 95% of an executable, the best performance optimization is, in fact, size optimization; and "optimize for performance" is best only applied to code that is really in the innermost loops. Note, however, that there is no "data size optimization" settings for any compilers. Compilers won't even touch your data structures, and it is entirely the responsibility of the programmer to optimize data structures for minimum size, --alignment considerations notwithstanding.

Another refrain in code optimization always reminds us of something about QuickSort versus BubbleSort; I forget what it was, now. Indeed, BubbleSort can be 100's of times faster than QuickSort. Don't believe me? Put it this way: Quick Sort boasts O(N*logN), versus O(N^2) for bubble; BUT, the dice are loaded in those tests. The data set is typicall totally randomized. That puts QuickSort in the best light, and BubbleSort in the worst. If our data were already sorted, we'd be catching QuickSort at its worst (still taking O(N*logN) time to do nothing at all), while catching BubbleSort at its best (one quick pass, or O(N), and it exits). Even if the data is only roughly pre-sorted, but we can guarantee sorting in a small number of BubbleSort passes, the performance is still officially O(N), and BubbleSort wins. And not only that, but a pass of BubbleSort also wins the cache coherence benchmark: A linear progression is easy to predict by the automatic prefetching mechanisms of modern CPU's, whereas QuickSort jumps all over the place, resulting in cache misses galore. In a 3D engine, a lot of the sorting needs are indeed incremental, working with pre-sorted data. Suppose you keep an index of objects in a scene sorted by distance to the camera. The first time you order the objects, QuickSort is the thing to use; but on subsequent 'frames', the index is already 99.9% sorted; any two objects in it would swap positions once every so many frames. Thus, a pass of BubbleSort --occasionally two-- are sufficient to restore the sorted status of the index, at a fraction of the processing cost. Conclusion: Use QuickSort where you expect the data to be highly random; but choose BubbleSort otherwise.

Use the STL

STL uses introsort, which is an insertion sort for small fragments, and a quicksort for other. It's said to be provably O(n log n), though I didn't see such proof (it doesn't mean the assertion is false though). Also, if you require stable performance and memory efficiency while sorting, using the STL's heapify and heap_pop is a very simple way to perform a heapsort in o(n + n log n).

Python uses another sorting method, a very very sophisticated one, that has a certain overhead over STL's introsort but can perform O(n) for presorted (or roughly presorted) sequences... timsort they called it. I believe it merits an STL-like implementation.

In any case the morale is: use the libraries. Many problems have been solved by the STL and Boost, in performant and generic ways. When the STL and/or Boost don't cut it, it's a simple matter of extending it in STL-like and/or Boost-like fashion, rather than coding tons of special-purpose code.

Use the type system

I can't think of a better way of titling this section than.. USE THE TYPE SYSTEM What I'm talking about is increasing code safety and cleanliness by using types for things like measures and identifiers. Say, for instance, you have a system of factions, which have a name (string) and ID numbers (ints). First problem with that is you might end up duplicating gazillions of functions, for the sake of accepting string or IDnum identifiers (void updateRelation( string faction1, string faction2 ); void updateRelationInt( int faction1, int faction2 ); etc.). Second problem is that if you use integers as identifiers for groups or crowds, and for cities, and for door-frames, you could conceivably write code that subtracts a group from a faction, adds a city, and assigns the result to a door-frame; and the compiler would compile and link that happily. (Some might think of this as "Power"...). Instead, use a class for Faction (excuse the size of the example and the simplified pseudo-code...(and I hope someone doesn't think I'm suggesting that all functions should be defined inline in the class declaration; I'm doing so in hopes to make the example clearer and not make you go different places to look up the functions' code...)):

  class Faction
  {
    using namespace std;
    typedef unsigned short ID_t; //16 bits should suffice for faction ID's?
    ID_t ID;
    ...
    //ID_generator should be a singleton, really; just trying to simplify the example
    class ID_generator : private MultiServerSynchronizedIDGen
    {
      hash_map< string, ID_t > StoID_table; //local cache duplicating server db
      array< string > IDtoS_table; //local cache duplicating server db
    public:
      ID_t getNewID( string aNewName ) //exceptional violation of command/query
      {
        ID_t id = MultiServerSynchronizedIDGen( aNewName );
        StoID_table.insert( aNewName, id );
        IDtoS_table[ID] = ( aNewName );
      }
      string const & stringFromID( ID_t id ) const { assert( id ); return IDtoS_table[ID]; }
      ID_t IDfromString( string const & s ) const { return StoID_table.find(s).second(); } //or whatever
    };
    static ID_generator IDgen; //should be a singleton, rather than static
  public:
    Faction() : ID(0) {}
    //compiler-generated copy ctor and assignment ok
    bool is_initialized() const { return ID != 0 };
    #ifndef NDEBUG
      ID_t getID() const //ID's not needed by clients; for debugging only
      {
        assert( is_initialized );
        return ID;
      }
    #endif
    string const & getName() const
    {
      assert( is_initialized );
      return IDgen.stringFromID( ID_t );
    }
    friend bool operator==( Faction const & fac1, Faction const & fac2 )
    {
      assert( fac1.is_initialized() );
      assert( fac2.is_initialized() );
      return fac1.ID == fac2.ID;
    }
    friend bool operator!=( Faction const & fac1, Faction const & fac2 )
    {
      return !fac1==fac2;
    }
    void createNewFaction( string const & aNewName ) //no-throw
    {
      //simplified pseudo-code:
      //ensure name is usable
      assert( aNewName.len() > 0 );
      //object should be UN-initialized
      assert( ID == 0 );
      //ensure name doesn't already exist
      assert( IDgen.IDfromString( aNewName ) == 0 );
      //do it!
      try
      {
        ID = IDgen.getNewID( aNewName );
      }
      catch(...)
      {
        ... //resolve exceptions, as this function has to be no-throw
      }
      assert( ID != 0 );
    }
    explicit Faction( string const & aName )
    {
      assert( aName.len() > 0 );
      ID = IDgen.IDfromString( aName );
      if( ID == 0 )
        ID = createNewFaction( aName );
      assert( ID != 0 );
    }
  };

Now, you can use Faction objects in place of names or ID's everywhere a faction needs to be identified; and if you try to add a faction to a group and subtract that from a door-frame, the compiler will refuse. In fact; you can't even add a Faction to a Faction, because the + and += operators are not defined; --only those operators that are needed are defined (== and !=). Furthermore, all the complexities of strings and number identifiers management are hidden from clients. Faction relations could also be made part of the class, with a new, static hash-table that uses <ID_t,ID_t> pairs as keys. Also, the Faction class above wouldn't need to be PIMPL'd: it's size is only 2 bytes (16 bits).

Let all conversion constructors be explicit. Let there be no conversion operators.

A similar philosophy can be applied to measures (short of using a real SI-units package): Instead of passing around lengths and distances as floats, it is MUCH safer AND clearer to have like,

  template <typename T> //representation --e.g. float, double...
  class meters
  {
    T num;
  public:
    meters() : num(0.0) {}
    explicit meters( T anum ) : num(anum) {}
    template <typename U>
    explicit meters( U anum ) : num(anum) {}
    //only needed operators:
    .......
  };
  template meters<float>;
  template meters<double>; //or however explicit instantiation was written; I forget
  ............
  ............
  ............
  #include meters.h
  #include seconds.h
  #include m_per_s.h
  #include number.h
  ............
  template< typename T >
  number<T> ::operator/( meters<T> m1, meters<T> m2 )
  {
    assert( m2 != 0.0 );
    return m1/m2;
  }
  ............
  template< typename T >
  mps<T> ::operator/( meters<T> m, seconds<T> s )
  {
    assert( s != 0.0 );
    return m/s;
  }

Easy to code. Much easier to use than SI-unit packages; --though very limited. But enormously useful in terms of bug-catching and code clarity. This is what types were invented for. There's no justification to be passing around raw ints and floats.

A further refinement to making units into data-types, would be to separate absolute from relative measures. We could leave the meters<T> class above abstract, and derive two concrete classes: meters_position and meters_distance. Although a position IS a distance (from an origin), the distinction has practical usefulness. You can define only those operators that make sense, then. Adding a position to another makes no sense, for example, so the operator,

  template< typename T >
  void operator+( meters_position<T>, meters_position<T> ){}

returns nothing. Trying to assign its result results in a compile time error. But subtracting a position from another does make sense, and yields a distance. Thus,

  template< typename T >
  meters_distance<T> operator-( meters_position<T> mp1, meters_position<T> mp2 )
  {
    return mp1.val - mp2.val;
  }

Similarly,

  //a distance minus a distance is a distance
  template< typename T >
  meters_distance<T> operator-( meters_distance<T> md1, meters_distance<T> md2 )
  {
    return md1.val - md2.val;
  }
  //subtracting a distance from a position yields a new position
  template< typename T >
  meters_position<T> operator-( meters_position<T> mp1, meters_distance<T> md2 )
  {
    return mp1.val - md2.val;
  }
  //subtracting a position from a distance makes no sense
  template< typename T >
  void operator-( meters_distance<T>, meters_position<T> ){}

Same principle would apply to time-spans (time distances) versus date-times (time-positions). The Standard Library uses this technique for pointers: There's a type ptr_t and another type ptrdiff_t. Also note that cross-units operators, such as division of meters by seconds yielding mps, would only work with meters_distance<T> and seconds_span<T>; NOT with meters_postion<T>, etc.

One gottcha in the "absolute" measures as implemented above is that they assume a single frame of reference. This could be a serious issue for Sirikata, which if I understand correctly, uses multiple coordinate systems. One way to deal with this would be to use an additional template parameter for absolute measures to reference a coordinate system. To put it all together once again,

  //ABSTRACT:
  template <typename T>
  class meters
  {
    T val;
  public:
    meters() : val(0.0) {}
    explicit meters( T aval ) : val(aval) {}
    template <typename U>
    explicit meters( U aval ) : val(aval) {}
    ~meters() = 0;
    .......
  };
  //RELATIVE:
  template <typename T>
  class meters_distance : private meters<T> //do NOT inherit further
  {
  public:
    meters_distance() : meters() {}
    explicit meters_distance( T aval ) : meters(aval) {}
    template <typename U>
    explicit meters_distance( U aval ) : meters(aval) {}
    ~meters() = 0;
    ~meters_distance(){} //non-virtual; do NOT inherit further
    .......
  };
  //ABSOLUTE:
  template <typename T, typename C>
  class meters_position : private meters<T> //do NOT inherit further
  {
  public:
    meters_position() : meters() {}
    explicit meters_position( T aval ) : meters(aval) {}
    template <typename U>
    explicit meters_position( U aval ) : meters(aval) {}
    ~meters() = 0;
    ~meters_position(){} //non-virtual; do NOT inherit further
    .......
  };
  ..................................
  //ADDING AND SUBTRACTING MIXED ABSOLUTE AND RELATIVE MEASURES
  //functions below must be declared friend of classes above
  //positions can't be added; function returns void
  template <typename T, typename C>
  void operator+( meters_position<T,C>, meters_position<T,C> ){}
  //subtracting two positions yields a distance
  template <typename T, typename C>
  meters_distance<T> operator-( meters_position<T,C> p1, meters_position<T,C> p2 )
  {
    return meters_distance<T>( p1.val - p2.val );
  }
  //adding a distance to a position yields a new position
  template <typename T, typename C>
  meters_position<T,C> operator+( meters_position<T,C> p, meters_distance<T> d )
  {
    return meters_position<T,C>( p.val + d.val );
  }
  //subtracting a distance from a position yields a new position
  template <typename T, typename C>
  meters_position<T,C> operator-( meters_position<T,C> p, meters_distance<T> d )
  {
    return meters_position<T,C>( p.val - d.val );
  }
  //adding a position to a distance would be legal and equivalent to adding a
  //a distance to a position; but it is an un-intuitive way of writing it, so
  //we disallow it
  template <typename T, typename C>
  void operator+( meters_distance<T>, meters_position<T,C> ){}
  //subtracting a position from a distance makes no sense at all; disabled
  template <typename T, typename C>
  void operator-( meters_distance<T>, meters_position<T,C> ){}
  //adding and subtracting distances yield distances
  template <typename T>
  meters_distance<T> operator+( meters_distance<T> d1, meters_distance<T> d2 )
  {
    return meters_distance<T>( d1.val + d2.val );
  }
  template <typename T>
  meters_distance<T> operator-( meters_distance<T> d1, meters_distance<T> d2 )
  {
    return meters_distance<T>( d1.val - d2.val );
  }
  ..................................
  //CROSS-CONTINUUM OPERATORS ONLY USE RELATIVE MEASURES
  #include meters.h
  #include seconds.h
  #include mps.h
  #include number.h
  ............
  template< typename T >
  number<T> ::operator/( meters_distance<T> m1, meters_distance<T> m2 )
  {
    assert( m2 != 0.0 );
    return m1/m2;
  }
  ............
  template< typename T >
  m_per_s<T> ::operator/( meters_distance<T> m, seconds_span<T> s )
  {
    assert( s != 0.0 );
    return m/s;
  }

Note also that now we can define "absolute vector3D" (position3D) and "relative vector3D" (vector3D) as having members, respectively, meters_position<T> and meters_distance<T>

  template <typename T, typename C>
  class position3D //meters, where C is the "Coordinate system" or "Continuum"
  {
    meters_position<T,C> x,y,z;
  public:
    .............
  };
  template <typename T>
  class vector3D //meters
  {
    meters_distance<T> x,y,z;
  public:
    .............
  };

And the same rules apply when adding/subtracting mixed absolute and relative:

  //positions can't be added; function returns void
  template <typename T, typename C>
  void operator+( position3D<T,C>, position3D<T,C> ){}
  //subtracting two positions yields a distance
  template <typename T, typename C>
  vector3D<T> operator-( position3D<T,C> p1, position3D<T,C> p2 )
  {
    return vector3D<T>( p1.x-p2.x, p1.y-p2.y, p1.z-p2.z );
  }
  //adding a distance to a position yields a new position
  template <typename T, typename C>
  position3D<T,C> operator+( position3D<T,C> p, vector3D<T> v )
  {
    return position3D<T,C>( p.x+v.x, p.y+v.y, p.z+v.z );
  }
  //subtracting a distance from a position yields a new position
  template <typename T, typename C>
  position3D<T,C> operator-( position3D<T,C> p, vector3D<T> v )
  {
    return position3D<T,C>( p.x-v.x, p.y-v.y, p.z-v.z );
  }
  //adding a position to a distance would be legal and equivalent to adding a
  //a distance to a position; but it is an un-intuitive way of writing it, so
  //we disallow it
  template <typename T, typename C>
  void operator+( vector3D<T>, position3D<T,C> ){}
  //subtracting a position from a distance makes no sense at all; disabled
  template <typename T, typename C>
  void operator-( vector3D<T>, position3D<T,C> ){}
  //adding and subtracting distances yield distances
  template <typename T>
  vector3D<T> operator+( vector3D<T> v1, vector3D<T> v2 )
  {
    return vector3D<T>( v1.x+v2.x, v1.y+v2.y, v1.z+v2.z );
  }
  template <typename T>
  vector3D<T> operator-( vector3D<T> v1, vector3D<T> v2 )
  {
    return vector3D<T>( v1.x-v2.x, v1.y-v2.y, v1.z-v2.z );
  }

Well, more practically, we'd first implement += and -= operators as class functions; and later define external functions that use the += and -= operators. This avoids the necessity of declaring the + and - operators as friend-s. So the code above is just to illustrate the principle of absolute versus relative measures.