Archive for the ‘C++’ Category

The transitive power of C++’s const keyword

Thursday, February 10th, 2005

In my view, one of the most underused aspects of C++ is the idea of const methods and pointers. Like the private and protected keywords, they place a certain form of access control on an object. But unlike private and protected, they carry this element of control deeply through the call graph.

Imagine that you are designing an architecture where you have base classes A and B. Objects derived from B are not supposed to be able to access the BCannotTrigger() method of A objects…however they must be able to use the BCanTrigger() method. Objects derived from A need access to both BCannotTrigger() and BCanTrigger():

class A
   {
   ...
protected:
   // can't let classes derived from B invoke this
   virtual void BCannotTrigger();
 
public:
   // it's ok for classes derived from B to use this
   virtual void BCanTrigger(); 
   ...
   };
 
class B
   {
   ...
   // implement in your derived class
   virtual void Callback(A& aInput) = 0;
   ...
   };

This looks good on first inspection, but it offers a rather “shallow” protection. Imagine a class derived from A:

class SubclassOfA : public A
  {
  void BCanTrigger()
     {
     // do some stuff
     ...
 
     // now call useful routine
     BCannotTrigger();
     } 
  }

Unwittingly, the programmer has given B a back door into A. If you think it should be obvious that they were making a mistake, bear in mind that I gave these methods ridiculous names to make a point—in reality they’ll be named based on what they do, rather than who can call them. Secondly, this call could be much deeper inside a subroutine…so even if there was fair warning that this was not supposed to happen, someone could lose track.

I am frequently interested in stopping these “deep” transitive violations of access privileges, as opposed to the “shallow” questions of direct calls. Breaking it into more classes with more private and protected bits won’t help—crossing the method call boundary will throw away the crucial context information. It’s possible to put in run-time checks which catch this sort of violation, but C++ offers us one oddly-shaped tool for attacking this at compile-time: the const modifier!

Most people know const can be used to declare constant values in C++:

const float pi = 3.14; // three digit approximation will suffice

But when you start putting the keyword on methods, there’s a neat check the compiler does. A client holding a const pointer can’t call any methods for that class that aren’t marked with const. This applies transitively, because the body of a method marked with const can’t call methods on itself that aren’t also marked with const. Let’s take a look at how this might be a way of solving the situation described above:

class A
   {
   ...
public:
   virtual void BCannotTrigger();
   virtual void BCanTrigger() const;
   ...
   };
 
class B
   {
   ...
   virtual void Callback(const A& aInput) = 0;
   ...
   };

Suddenly, we are protected from the implementation of BCanTrigger() ever accidentally using BCannotTrigger(). It doesn’t matter how convoluted the call graph gets—the pointer that B has is fundamentally incapable of ever being the kind of pointer that can be used to generate a downstream call to BCannotTrigger()! Plus, the const property can be applied to each individual variable and parameter in your program, so you can completely control the granularity of these privileges.

Yet I said it was an “oddly-shaped” tool. Contractually we must recognize that C++ programmers expect that const objects don’t change “essential state” in-between method calls. This idea is semi-useful for optimization and documentation, yet it’s nowhere near as powerful as the compile-time contract validation demonstrated above—which most developers don’t think about. That’s why so many of them think const wastes time and don’t use it—they just haven’t seen what it can really do!

If we wanted to be sneaky, we could make all our member variables volatile. Then we could use const and non-const to represent a completely arbitrary “object mode bit”—checked and enforced in the transitive call graph by the compiler. I’ve been tempted to do this and throw the whole notion of “const is for constantness” out the window…but that would be a bad practice given the understandings already established in the C++ community. So don’t do that!

However, when you create a Widget, consider making a strong and interesting choice about what a const or non-const Widget might conceptually represent. Think big: a const TextFile object could have read-only access, while a non-const one could have read/write-access. Examples like this are unconventional but not far-fetched, and your architecture can really start enforcing deep contracts at compile time (without the explosion of parallel code and client hassle that happens with separate ReadOnlyTextFile and ReadWriteTextFile objects).

With a little cleverness and a little luck, you might get much more mileage out of const than you ever expected!

Some C++ Code Formatting Thoughts

Saturday, January 1st, 2005

There’s a great page on Wikipedia where people are gathering together information about programming style But here are some notes on formatting that I put together a few years ago, which shares a few of my ideas.

Overview

Adopting a system of formatting is not just a means of keeping collaborators from constantly reformatting each others files. Uniformity can be a vehicle for helping those who are approaching a new program’s source to absorb it faster. It can also make humdrum decisions for developing on that code base flow more automatically, which avoids wasting valuable time which could be spent working on the design.

All told, the conventions which are chosen are far less important than agreeing upon a system. I have faith that someday we will use development tools which abstract us from the legacy of ASCII and let us focus on the computational intent of our projects, rendering our programs to us however we want to see them. Until that day comes, an agreement must be forged for each development group that shares text files among multiple people.

Indentation

Wikipedia has a great survey of indentation styles. My own indentation favorite for a long time was the Whitesmith’s style (though I didn’t know it was called that):

void SomeFunction()
   {
   if (condition)
      {
      TakeAction1();
      TakeAction2();
      }
   }

The article sums up the advantages, and why it rose to prominence in certain domains (though it’s not so popular today):

The advantages of this style are similar to those of the Allman style in that blocks are clearly set apart from control statements. However with Whitesmiths style, the block is still visually connected to its control statement instead of looking like an unrelated block of code surrounded by whitespace. Another advantage is that the alignment of the braces with the block emphasizes the fact that the entire block is conceptually (as well as programmatically) a single compound statement. Furthermore, indenting the braces emphasizes that they are subordinate to the control statement.

I still like it the best, and it’s what I use if I’m starting something from scratch for myself. If I could go back in time and convince everyone to use it…I might. But though I do think there’s a reasonable case for it visually, there are rationales that sometimes transcend the visual. Such is the case when GNU developers format their code like this:

void SomeFunction()
{
  if (condition)
    {
      TakeAction1();
      TakeAction2();
    }
}

Which isn’t even consistent—the indentation rule for the function is asymmetric to the indentation for the block inside the if statement. Yet many important GNU developers use Emacs, where parentheses and braces in the first column are called “defun”s (for define function) and are used for automatic navigation, and other forms of processing. If you develop on GNU source and try to do it any other way, they will reformat your changes.

Whatever style I’m using, I prefer braces to have their own lines in order to call out the importance of a scope. The addition or removal of a scope is an important programming act in C++ because they trigger destructors. As an added benefit of having braces on their own line, you can prevent a spurious difference on the condition line if you change between the above and:

void SomeFunction()
   {
   if (condition)
      TakeAction1();
   }

The condition didn’t change. Why should that line be flagged with a diff?

Padding

Generally speaking, I don’t like to treat source code as “two-dimensional”…where the programmer is concerned with the absolute location of code on the screen. This means that I don’t use padding—either with spaces or tabs—to specifically align variable declarations in columns. For example, I would not write:

void SomeFunction()
   {
   int               IntegerVariable;
   float             FloatVariable;
   LPDIRECT3DDEVICE8 lpd3d;
   ...
   }

There’s obviously a lot of maintenance involved: each time you add or remove a variable you have to manually update the spacing. As with most questionable-value formatting decisions, I suggest keeping it simple and low-maintenance…shifting my effort towards the code itself. Readability can be helped in a much less labor-intensive fashion by merely using a proportional font (instead of a fixed-width font like Courier):

   void SomeFunction()
      {
      int IntegerVariable;
      float FloatVariable;
      LPDIRECT3DDEVICE8 lpd3d;
      …
      }

Many professional code editors now provide this feature. If you are still using a fixed-width font for editing source code, I suggest you try changing to something like Verdana or Times New Roman for a few days. (These work better than Arial because the typeface distinguishes lowercase “L” from uppercase “I”.) See if you’re not convinced.

Of course, that only works if you use genuine “tab” characters instead of spaces (it surprises me that not everyone does, but apparently there are some holdouts). It frees you from having to force your programming team to use any particular number of spaces for indentation, they can just choose a way to view tabs in their editor. Indenting and outdenting takes only one keypress instead of N (where N is your tab width).

So if you are indenting with spaces, stop that!!

Line breaks

I put line breaks in a few specific places which could grow into a long list of what are essentially statements. Member initialization is the case that comes to mind:

SomeClass::SomeClass(float initial_value, char* name) :
   BaseClass (universal_constant * Sine(initial_value)),
   member_name (MakeSafeString(name))
   {
   ...
   }

Generally, though, I avoid using line breaks. Trying to fit the code into an 80 column view and keeping the line breaks up to date as you change code strikes me as a waste of valuable time. You can go a long way toward using fewer line breaks if you are editing in a proportional font, since a lot more fits on the screen at once. Many editors support word wrap, and you can assign that to a hotkey to turn it on and off. (Visual Studio has it under Ctrl-RR.)

Spacing

A few rules I employ about spacing, since applying consistency is generally nice:

  • No spaces between function calls and their parameters
    e.g. Foo(a) and not Foo (a)
  • A space between keywords and their parameters
    e.g. if (a) and not if(a)
  • Spaces between constructors and their parenthesis
    e.g. new FOO (a) and not new FOO(a)
  • Spaces between formal and actual parameters
    e.g. void Foo(int a, int b) and not void Foo(int a,int b)

Not too important, but since spacing doesn’t matter, it is nice to go ahead and use it to help visually cue the differences between these different constructs.

Comments

When comments are necessary, I use the “//” C++ comment format, generally avoiding the “/* */ ” C style. I reserve the C style for surgically removing randomly ranged blocks of code for test purposes (such as parameters in the middle of long argument lists).

If I’m programming on my own projects, then in order to distinguish between comments of specific functions/classes and documentation of function/class groups, I use the following convention:

// function group comment, applies until next group comment.
// e.g., the following are Brian's Math Library Functions  
 
float sqrt(float f)
   // computes the square root using an offset table
   // iterates on the number 16 times, blah blah
   {
   ...
   }
 
float sqr(float f)
   // uses simple multiplication, special escape
   // route if zero since floating point multiplication
   // is not specialized to recognize zero
   {
   ...
   }
 
// function group comment, applies to next set
// of functions etc. etc.

This is particularly useful when integrated with Microsoft Visual Studio’s outlining feature, because the function-level comments do not interfere with the outlined display. Due to the growing success of such modes, I avoid putting in large dividing lines (e.g. comment lines full of dashes) because their function is much better achieved by outlining tools.

Sums of numbers that add to X

Sunday, October 10th, 2004

I saw the following coding problem:

Write a function that takes 4 integers as input and returns an integer. The four input values will each range from 1-10, and the function should calculate the number of combinations of input that add to 15 and return that number of combinations.

For example, if the four input values were labeled A-D and had the values 5, 5, 5 and 10 there would be 4 combinations adding to 15:

Input:
A = 5
B = 5
C = 5
D = 10

Combos:
A+D = 15
B+D = 15
C+D = 15
A+B+C = 15

Desired output: 4

Please describe how you would modify your solution to accept N input values (instead of just four).

My first try turned out to be slower than a solution from an experienced C++ programmer who’d leveraged the C++ standard library algorithms:

class Knapsack
{
public:
  Knapsack (std::vector::const_iterator beg,
      std::vector::const_iterator end,
      int sum)
    : _pieces (beg, end), _sum (sum)
    {
        // precondition: input was sorted
    }
  int UniqueCombinations ();
private:
  std::vector _pieces;
  int _sum;
};
 
// Each combination is counted only once
// Hint: If a pick contains several equal items in a row,
// they were all picked from the front of a run of
// equal items in the (sorted) input (see the skipping below)
int Knapsack::UniqueCombinations ()
{
  int count = 0;
  std::vector::const_iterator it = _pieces.begin ();
  std::vector::const_iterator end = _pieces.end ();
  assert (it != end);    
  do
  {
     int curItem = *it;
     int newSum = _sum - curItem;
     ++it;
     if (newSum == 0)
     {
       ++count;
     }
     else if (newSum > 0 && it != end)
     {
       Knapsack smallerKnapsack (it, end, newSum);
       count += smallerKnapsack.UniqueCombinations ();
     }
     // skip all equal value items
     while (it != end && *it == curItem)
       ++it;
    } while (it != end);
  return count;
}
 
int FillKnapsack (int const * arr, int size, int sum)
{
  if (size == 0)
  {
    std::cout << "0 combinations: empty array" << std::endl;
    return 0;
  }
 
  try
  {
    std::vector data (arr, arr + size);
    // sort in decreasing order
    std::sort (data.begin (), data.end (), std::greater ());
    Knapsack knapsack (data.begin (), data.end (), sum);
    int comb = knapsack.UniqueCombinations ();
 
    std::cout << comb << " combination(s)" << std::endl;
    return comb;
  }
  catch (std::bad_alloc)
  {
    std::cerr << "Out of memory" << std::endl;
  }
  catch (...)
  {
    std::cerr << "Unknown error" << std::endl;
  }
  return 0;
}

Not to be outdone, I started wondering if there was a way to do it even faster. Since the values in the array are bounded, I decided to immediately transform the input into a histogram telling how many of each number appeared. Then I would do a brute force march through all the histograms that were “recipes” adding to the target sum, and add the appropriate number of sums for each match. So the sample case:

A = 5
B = 5
C = 5
D = 10

I’d quickly transform it to:

[0,0,0,0,3,0,0,0,0,1] // three fives and one 10

Then, I’d generate the exhaustive list of “recipes” that sum to 15:

[15,0,0,0,0,0,0,0,0,0] // fifteen ones
[13,1,0,0,0,0,0,0,0,0] // thirteen ones and one two
[12,0,1,0,0,0,0,0,0,0] // twelve ones and one three

[0,0,0,0,1,0,0,0,0,1] // one ten and one five

From there it was just a matter of seeing how many times a recipe could be applied on the input. That’s relatively straightforward combinatorics, but my example cases on large data sets overflowed my naive algorithm for combinations. It was a hard bug to track down, but once I replaced it with a better routine off the web I had a blazing fast algorithm. Its performance is determined by the range of the numbers and the value of the sum, not the size of the input array.

// combinations
// how many combinations of k objects can you choose from a set of n objects ?
// this method avoids overflow that you hit with:
//    factorial(n) / (factorial(k) * factorial(n-k))
// see http://www.cs.sjsu.edu/faculty/fecteau/cs140/combPerm.html
unsigned int combinations(unsigned int r, unsigned int n)
  {
  // Assert(n >= k);
  unsigned int k = r < n-r ? r : n-r;
  unsigned int answer = 1;
  unsigned int multiplier = n;
  unsigned int divisor = 1;
  while (divisor <= k)
    {
    answer = ( answer * multiplier ) / divisor; // this will evenly divide 
    multiplier--;
    divisor++;
    }
  return answer;
  }
 
// SumsRecipe
//
// Given a "recipe" for producing a certain sum, tell how many ways that recipe
// can be achieved given the set of numbers indicated by Nums.  Both the recipe
// and the input list are set up so that array[j] tells you how many j's you have
// (or in the case of the recipe, how many j's are needed)
//
unsigned int SumsRecipe(const unsigned int Nums[],
    const unsigned int Recipe[],
    unsigned int Max)
  {
  unsigned int Sums = 1;
  for (unsigned int j = 1; j <= Max; j++)
    {
    if (Nums[j] < Recipe[j])
      return 0;
    if (Recipe[j] != 0)
      Sums *= combinations(Recipe[j], Nums[j]);
    }
  return Sums;
  }
 
// EnumRecipe
//
// Class which provides a somewhat brute-force method of enumerating over all
// the possible "Recipes" for making the value X as a sum of numbers 1-Max.
//
// It really just ticks through the recipes like an odometer, trying:
//   1 one, then 2 ones, then 3 ones...
// When it hits the maximum number of ones you can have it then tries:
//   1 two with 0 ones, then 1 two with 1 one, then 1 two with 2 ones...
//
// Although it might seem like this is a terrible way to generate these
// recipes, there are a couple of "saving graces".  We know
// we don't have more than Length numbers available, and that the total
// can't be greater than X...so if we hit an excess of either of these
// conditions we can fast-forward the "odometer" to the next setting
// which zeroes one of the wheels.
//
class EnumRecipe
  {
private:
  unsigned int m_X; // Target value for our recipe's sum
  unsigned int m_Length; // Most numbers that could occur in a recipe
  unsigned int m_Max; // Maximum value of any given number
  unsigned int m_Used; // Numbers used in this recipe
  unsigned int m_Total; // Current running total sum
  bool m_First; // First time calling the enumerator?
  const unsigned int* m_Nums; // How many of each number is available in the input?
 
public:
  EnumRecipe(unsigned int X,  
      const unsigned int* Nums, 
      unsigned int Length, 
      unsigned int Max) :
    m_X(X),
    m_Nums(Nums),
    m_Length(Length),
    m_Max(Max),
    m_Used(0),
    m_Total(0),
    m_First(true)
    { // Nothing to do
    }
 
  bool NextRecipe(unsigned int Recipe[])
    {
    if (m_First)
      // since this is the first call, start by zeroing out
      // the recipe that the user passed in...
      {
      for (unsigned int j = 0; j <= m_Max; j++)
        Recipe[j] = 0;
      m_First = false;
      // Assert(m_Total == 0);
      // Assert(m_Used == 0);
      }
 
    unsigned int Wheel = 1;
    while (true)
      {
      // Will increasing the current "Wheel" lead to a
      // "Recipe" that uses more values than we have as
      // input, or one that the recipe's summation will 
      // exceed our target X value?
      if ((m_Used < m_Length) && 
          (m_Total < m_X) && 
          (m_Nums[Wheel] >= Recipe[Wheel]))
        {
        Recipe[Wheel]++;
        m_Total += Wheel;
        m_Used++;
 
        if (m_Total == m_X)
          return true;
 
        // as on an odometer, each time we successfully
        // increment one of the wheels, the next wheel
        // we'll try to increment will be the first again
        Wheel = 1;
        }
      else
        // We reached the maximum value for this "wheel"
        {
        // Is it the last one?  If so, we're done.
        if (Wheel == m_Max)
          return false;
 
        // Zero this wheel and update our state variables.
        // This intermediate state is one that we've visited
        // before, so no need to test total against m_X
       m_Total -= (Wheel * Recipe[Wheel]);
       m_Used -= Recipe[Wheel];
       Recipe[Wheel] = 0;
       Wheel++;
        }
      }
 
    return true;
    }
 
  virtual ~EnumRecipe()
    { // Nothing to do
    }
  };
 
// SumsThatAddToXSpecialized
//
// Specialized version of code which answers the "how many ways
// are there to achieve the sum of X using the integers in Values[]".
// This case is specialized because it assumes that the
// values in the array range from 1..Max.  What's different about
// this particular approach is that the performance is not a
// function of the length of the input array.
//
unsigned int SumsThatAddToXSpecialized(
    unsigned int X, 
    const unsigned int Values[], 
    unsigned int Length,
    unsigned int Max)
  {
  unsigned int Nums[Max+1];
  for (unsigned int j = 0; j <= Max; j++)
    Nums[j] = 0;
  for (unsigned int i = 0; i < Length; i++)
    {
    // Assert((Values[i] <= Max) && (Values[i] > 0));
    Nums[Values[i]]++;
    }
 
  unsigned int Sums = 0;
 
  unsigned int Recipe[Max+1];
  EnumRecipe er (X, Nums, Length, Max);
  while (er.NextRecipe(Recipe))
    Sums += SumsRecipe(Nums, Recipe, Max);
 
  return Sums;
  }

It’s fun to see something like this run so fast and still be correct!


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported