Posts Tagged ‘cplusplus’

Some C++ Code Formatting Thoughts

Saturday, January 1st, 2005

There’s a great page on Wikipedia where people are gathering together information about programming style But here are some notes on formatting that I put together a few years ago, which shares a few of my ideas.

Overview

Adopting a system of formatting is not just a means of keeping collaborators from constantly reformatting each others files. Uniformity can be a vehicle for helping those who are approaching a new program’s source to absorb it faster. It can also make humdrum decisions for developing on that code base flow more automatically, which avoids wasting valuable time which could be spent working on the design.

All told, the conventions which are chosen are far less important than agreeing upon a system. I have faith that someday we will use development tools which abstract us from the legacy of ASCII and let us focus on the computational intent of our projects, rendering our programs to us however we want to see them. Until that day comes, an agreement must be forged for each development group that shares text files among multiple people.

Indentation

Wikipedia has a great survey of indentation styles. My own indentation favorite for a long time was the Whitesmith’s style (though I didn’t know it was called that):

void SomeFunction()
   {
   if (condition)
      {
      TakeAction1();
      TakeAction2();
      }
   }

The article sums up the advantages, and why it rose to prominence in certain domains (though it’s not so popular today):

The advantages of this style are similar to those of the Allman style in that blocks are clearly set apart from control statements. However with Whitesmiths style, the block is still visually connected to its control statement instead of looking like an unrelated block of code surrounded by whitespace. Another advantage is that the alignment of the braces with the block emphasizes the fact that the entire block is conceptually (as well as programmatically) a single compound statement. Furthermore, indenting the braces emphasizes that they are subordinate to the control statement.

I still like it the best, and it’s what I use if I’m starting something from scratch for myself. If I could go back in time and convince everyone to use it…I might. But though I do think there’s a reasonable case for it visually, there are rationales that sometimes transcend the visual. Such is the case when GNU developers format their code like this:

void SomeFunction()
{
  if (condition)
    {
      TakeAction1();
      TakeAction2();
    }
}

Which isn’t even consistent—the indentation rule for the function is asymmetric to the indentation for the block inside the if statement. Yet many important GNU developers use Emacs, where parentheses and braces in the first column are called “defun”s (for define function) and are used for automatic navigation, and other forms of processing. If you develop on GNU source and try to do it any other way, they will reformat your changes.

Whatever style I’m using, I prefer braces to have their own lines in order to call out the importance of a scope. The addition or removal of a scope is an important programming act in C++ because they trigger destructors. As an added benefit of having braces on their own line, you can prevent a spurious difference on the condition line if you change between the above and:

void SomeFunction()
   {
   if (condition)
      TakeAction1();
   }

The condition didn’t change. Why should that line be flagged with a diff?

Padding

Generally speaking, I don’t like to treat source code as “two-dimensional”…where the programmer is concerned with the absolute location of code on the screen. This means that I don’t use padding—either with spaces or tabs—to specifically align variable declarations in columns. For example, I would not write:

void SomeFunction()
   {
   int               IntegerVariable;
   float             FloatVariable;
   LPDIRECT3DDEVICE8 lpd3d;
   ...
   }

There’s obviously a lot of maintenance involved: each time you add or remove a variable you have to manually update the spacing. As with most questionable-value formatting decisions, I suggest keeping it simple and low-maintenance…shifting my effort towards the code itself. Readability can be helped in a much less labor-intensive fashion by merely using a proportional font (instead of a fixed-width font like Courier):

   void SomeFunction()
      {
      int IntegerVariable;
      float FloatVariable;
      LPDIRECT3DDEVICE8 lpd3d;
      …
      }

Many professional code editors now provide this feature. If you are still using a fixed-width font for editing source code, I suggest you try changing to something like Verdana or Times New Roman for a few days. (These work better than Arial because the typeface distinguishes lowercase “L” from uppercase “I”.) See if you’re not convinced.

Of course, that only works if you use genuine “tab” characters instead of spaces (it surprises me that not everyone does, but apparently there are some holdouts). It frees you from having to force your programming team to use any particular number of spaces for indentation, they can just choose a way to view tabs in their editor. Indenting and outdenting takes only one keypress instead of N (where N is your tab width).

So if you are indenting with spaces, stop that!!

Line breaks

I put line breaks in a few specific places which could grow into a long list of what are essentially statements. Member initialization is the case that comes to mind:

SomeClass::SomeClass(float initial_value, char* name) :
   BaseClass (universal_constant * Sine(initial_value)),
   member_name (MakeSafeString(name))
   {
   ...
   }

Generally, though, I avoid using line breaks. Trying to fit the code into an 80 column view and keeping the line breaks up to date as you change code strikes me as a waste of valuable time. You can go a long way toward using fewer line breaks if you are editing in a proportional font, since a lot more fits on the screen at once. Many editors support word wrap, and you can assign that to a hotkey to turn it on and off. (Visual Studio has it under Ctrl-RR.)

Spacing

A few rules I employ about spacing, since applying consistency is generally nice:

  • No spaces between function calls and their parameters
    e.g. Foo(a) and not Foo (a)
  • A space between keywords and their parameters
    e.g. if (a) and not if(a)
  • Spaces between constructors and their parenthesis
    e.g. new FOO (a) and not new FOO(a)
  • Spaces between formal and actual parameters
    e.g. void Foo(int a, int b) and not void Foo(int a,int b)

Not too important, but since spacing doesn’t matter, it is nice to go ahead and use it to help visually cue the differences between these different constructs.

Comments

When comments are necessary, I use the “//” C++ comment format, generally avoiding the “/* */ ” C style. I reserve the C style for surgically removing randomly ranged blocks of code for test purposes (such as parameters in the middle of long argument lists).

If I’m programming on my own projects, then in order to distinguish between comments of specific functions/classes and documentation of function/class groups, I use the following convention:

// function group comment, applies until next group comment.
// e.g., the following are Brian's Math Library Functions  
 
float sqrt(float f)
   // computes the square root using an offset table
   // iterates on the number 16 times, blah blah
   {
   ...
   }
 
float sqr(float f)
   // uses simple multiplication, special escape
   // route if zero since floating point multiplication
   // is not specialized to recognize zero
   {
   ...
   }
 
// function group comment, applies to next set
// of functions etc. etc.

This is particularly useful when integrated with Microsoft Visual Studio’s outlining feature, because the function-level comments do not interfere with the outlined display. Due to the growing success of such modes, I avoid putting in large dividing lines (e.g. comment lines full of dashes) because their function is much better achieved by outlining tools.

Sums of numbers that add to X

Sunday, October 10th, 2004

I saw the following coding problem:

Write a function that takes 4 integers as input and returns an integer. The four input values will each range from 1-10, and the function should calculate the number of combinations of input that add to 15 and return that number of combinations.

For example, if the four input values were labeled A-D and had the values 5, 5, 5 and 10 there would be 4 combinations adding to 15:

Input:
A = 5
B = 5
C = 5
D = 10

Combos:
A+D = 15
B+D = 15
C+D = 15
A+B+C = 15

Desired output: 4

Please describe how you would modify your solution to accept N input values (instead of just four).

My first try turned out to be slower than a solution from an experienced C++ programmer who’d leveraged the C++ standard library algorithms:

class Knapsack
{
public:
  Knapsack (std::vector::const_iterator beg,
      std::vector::const_iterator end,
      int sum)
    : _pieces (beg, end), _sum (sum)
    {
        // precondition: input was sorted
    }
  int UniqueCombinations ();
private:
  std::vector _pieces;
  int _sum;
};
 
// Each combination is counted only once
// Hint: If a pick contains several equal items in a row,
// they were all picked from the front of a run of
// equal items in the (sorted) input (see the skipping below)
int Knapsack::UniqueCombinations ()
{
  int count = 0;
  std::vector::const_iterator it = _pieces.begin ();
  std::vector::const_iterator end = _pieces.end ();
  assert (it != end);    
  do
  {
     int curItem = *it;
     int newSum = _sum - curItem;
     ++it;
     if (newSum == 0)
     {
       ++count;
     }
     else if (newSum > 0 && it != end)
     {
       Knapsack smallerKnapsack (it, end, newSum);
       count += smallerKnapsack.UniqueCombinations ();
     }
     // skip all equal value items
     while (it != end && *it == curItem)
       ++it;
    } while (it != end);
  return count;
}
 
int FillKnapsack (int const * arr, int size, int sum)
{
  if (size == 0)
  {
    std::cout << "0 combinations: empty array" << std::endl;
    return 0;
  }
 
  try
  {
    std::vector data (arr, arr + size);
    // sort in decreasing order
    std::sort (data.begin (), data.end (), std::greater ());
    Knapsack knapsack (data.begin (), data.end (), sum);
    int comb = knapsack.UniqueCombinations ();
 
    std::cout << comb << " combination(s)" << std::endl;
    return comb;
  }
  catch (std::bad_alloc)
  {
    std::cerr << "Out of memory" << std::endl;
  }
  catch (...)
  {
    std::cerr << "Unknown error" << std::endl;
  }
  return 0;
}

Not to be outdone, I started wondering if there was a way to do it even faster. Since the values in the array are bounded, I decided to immediately transform the input into a histogram telling how many of each number appeared. Then I would do a brute force march through all the histograms that were “recipes” adding to the target sum, and add the appropriate number of sums for each match. So the sample case:

A = 5
B = 5
C = 5
D = 10

I’d quickly transform it to:

[0,0,0,0,3,0,0,0,0,1] // three fives and one 10

Then, I’d generate the exhaustive list of “recipes” that sum to 15:

[15,0,0,0,0,0,0,0,0,0] // fifteen ones
[13,1,0,0,0,0,0,0,0,0] // thirteen ones and one two
[12,0,1,0,0,0,0,0,0,0] // twelve ones and one three

[0,0,0,0,1,0,0,0,0,1] // one ten and one five

From there it was just a matter of seeing how many times a recipe could be applied on the input. That’s relatively straightforward combinatorics, but my example cases on large data sets overflowed my naive algorithm for combinations. It was a hard bug to track down, but once I replaced it with a better routine off the web I had a blazing fast algorithm. Its performance is determined by the range of the numbers and the value of the sum, not the size of the input array.

// combinations
// how many combinations of k objects can you choose from a set of n objects ?
// this method avoids overflow that you hit with:
//    factorial(n) / (factorial(k) * factorial(n-k))
// see http://www.cs.sjsu.edu/faculty/fecteau/cs140/combPerm.html
unsigned int combinations(unsigned int r, unsigned int n)
  {
  // Assert(n >= k);
  unsigned int k = r < n-r ? r : n-r;
  unsigned int answer = 1;
  unsigned int multiplier = n;
  unsigned int divisor = 1;
  while (divisor <= k)
    {
    answer = ( answer * multiplier ) / divisor; // this will evenly divide 
    multiplier--;
    divisor++;
    }
  return answer;
  }
 
// SumsRecipe
//
// Given a "recipe" for producing a certain sum, tell how many ways that recipe
// can be achieved given the set of numbers indicated by Nums.  Both the recipe
// and the input list are set up so that array[j] tells you how many j's you have
// (or in the case of the recipe, how many j's are needed)
//
unsigned int SumsRecipe(const unsigned int Nums[],
    const unsigned int Recipe[],
    unsigned int Max)
  {
  unsigned int Sums = 1;
  for (unsigned int j = 1; j <= Max; j++)
    {
    if (Nums[j] < Recipe[j])
      return 0;
    if (Recipe[j] != 0)
      Sums *= combinations(Recipe[j], Nums[j]);
    }
  return Sums;
  }
 
// EnumRecipe
//
// Class which provides a somewhat brute-force method of enumerating over all
// the possible "Recipes" for making the value X as a sum of numbers 1-Max.
//
// It really just ticks through the recipes like an odometer, trying:
//   1 one, then 2 ones, then 3 ones...
// When it hits the maximum number of ones you can have it then tries:
//   1 two with 0 ones, then 1 two with 1 one, then 1 two with 2 ones...
//
// Although it might seem like this is a terrible way to generate these
// recipes, there are a couple of "saving graces".  We know
// we don't have more than Length numbers available, and that the total
// can't be greater than X...so if we hit an excess of either of these
// conditions we can fast-forward the "odometer" to the next setting
// which zeroes one of the wheels.
//
class EnumRecipe
  {
private:
  unsigned int m_X; // Target value for our recipe's sum
  unsigned int m_Length; // Most numbers that could occur in a recipe
  unsigned int m_Max; // Maximum value of any given number
  unsigned int m_Used; // Numbers used in this recipe
  unsigned int m_Total; // Current running total sum
  bool m_First; // First time calling the enumerator?
  const unsigned int* m_Nums; // How many of each number is available in the input?
 
public:
  EnumRecipe(unsigned int X,  
      const unsigned int* Nums, 
      unsigned int Length, 
      unsigned int Max) :
    m_X(X),
    m_Nums(Nums),
    m_Length(Length),
    m_Max(Max),
    m_Used(0),
    m_Total(0),
    m_First(true)
    { // Nothing to do
    }
 
  bool NextRecipe(unsigned int Recipe[])
    {
    if (m_First)
      // since this is the first call, start by zeroing out
      // the recipe that the user passed in...
      {
      for (unsigned int j = 0; j <= m_Max; j++)
        Recipe[j] = 0;
      m_First = false;
      // Assert(m_Total == 0);
      // Assert(m_Used == 0);
      }
 
    unsigned int Wheel = 1;
    while (true)
      {
      // Will increasing the current "Wheel" lead to a
      // "Recipe" that uses more values than we have as
      // input, or one that the recipe's summation will 
      // exceed our target X value?
      if ((m_Used < m_Length) && 
          (m_Total < m_X) && 
          (m_Nums[Wheel] >= Recipe[Wheel]))
        {
        Recipe[Wheel]++;
        m_Total += Wheel;
        m_Used++;
 
        if (m_Total == m_X)
          return true;
 
        // as on an odometer, each time we successfully
        // increment one of the wheels, the next wheel
        // we'll try to increment will be the first again
        Wheel = 1;
        }
      else
        // We reached the maximum value for this "wheel"
        {
        // Is it the last one?  If so, we're done.
        if (Wheel == m_Max)
          return false;
 
        // Zero this wheel and update our state variables.
        // This intermediate state is one that we've visited
        // before, so no need to test total against m_X
       m_Total -= (Wheel * Recipe[Wheel]);
       m_Used -= Recipe[Wheel];
       Recipe[Wheel] = 0;
       Wheel++;
        }
      }
 
    return true;
    }
 
  virtual ~EnumRecipe()
    { // Nothing to do
    }
  };
 
// SumsThatAddToXSpecialized
//
// Specialized version of code which answers the "how many ways
// are there to achieve the sum of X using the integers in Values[]".
// This case is specialized because it assumes that the
// values in the array range from 1..Max.  What's different about
// this particular approach is that the performance is not a
// function of the length of the input array.
//
unsigned int SumsThatAddToXSpecialized(
    unsigned int X, 
    const unsigned int Values[], 
    unsigned int Length,
    unsigned int Max)
  {
  unsigned int Nums[Max+1];
  for (unsigned int j = 0; j <= Max; j++)
    Nums[j] = 0;
  for (unsigned int i = 0; i < Length; i++)
    {
    // Assert((Values[i] <= Max) && (Values[i] > 0));
    Nums[Values[i]]++;
    }
 
  unsigned int Sums = 0;
 
  unsigned int Recipe[Max+1];
  EnumRecipe er (X, Nums, Length, Max);
  while (er.NextRecipe(Recipe))
    Sums += SumsRecipe(Nums, Recipe, Max);
 
  return Sums;
  }

It’s fun to see something like this run so fast and still be correct!


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported