Archive for the ‘C++’ Category

Treating Non-Boolean Types as Logic Values

Monday, October 26th, 2009

Historically, I have always disliked the idea of using boolean test operations on types that can take a number of values. This deep-seated aesthetic grudge comes from cases like this:

int x = 0;
if (not x)
   cout << "X is zero" << endl;

I’m simply bothered by this. Perhaps it’s a cognitive thing where I just don’t feel there’s any “truth” to the idea that every non-zero integer is somehow “true” but zero is somehow “false”. It could just as easily be that negative numbers are false and non-negative numbers are true. Or if you ask a mathematical purist, they might suggest that at a more foundational level it is primes that are true and non-primes are false!

My comfort zone is when I’m only testing for “truth” and “falsehood” those things that can only have the values true and false. If I could, I would enforce this. So in the example above I would always write something more like:

int x = 0;
if (x == 0)
   cout << "X is zero" << endl;

That particular example is not very controversial, and I think most programmers would agree with me that’s a better way to test against a literal zero. It’s just better code for capturing the intention.

Yet I have historically considered that a logical extension of “zero is not false” is the premise that “null pointers are not false”. And that belief is contrary to practice:

shared_ptr< foo > fooPointer;
if (not fooPointer)
   cout << "fooPointer is null" << endl;

In fact, look what a mouthful you get if you insist on comparing against null literally:

shared_ptr< foo > fooPointer;
if (fooPointer.get() == nullptr)
   cout << "fooPointer is null" << endl;

In the past I have tried to work around this by creating inline template wrappers like “isNull()” so that I could stick to my guns and avoid flattening pointers into booleans. It doesn’t cost any more in the runtime, so I figured what’s the harm?

template< class T > inline bool isNull(shared_ptr< T > ptr) {
	return !ptr;
}

Yet I’ve decided this is a lost cause and too much of a speed bump in sharing my code with other C++ programmers. They’ve accepted the notion that null pointers are false and non-null pointers are true, and you use boolean logic to test this—not some other operator. It’s not worth it to pick this particular fight.

I’m not happy about it. But I now accept this for pointers only. And I’m going to use “not” instead of “!”… it’s part of the language and a lot more readable.

Smart Pointer Casting Study

Friday, July 10th, 2009

If you’re using public inheritance in C++, the compiler will implicitly “upcast” from a Derived class pointer to a Base class pointer. So I thought a std::auto_ptr to a Derived class would have a similar implicit upcast. It does…but only for assignment and construction!

auto_ptr_cast.cpp

The example shows that if you try to pass an auto_ptr<Derived> to a function that takes an auto_ptr<Base>, it fails to convert due to an ambiguity. You must put an explicit static_cast< auto_ptr<Base> > at the calling site.

This “minor” problem led me down an investigation of the state of auto_ptr and its alternative, unique_ptr. I sought the wisdom of friends and people on the Freenode IRC. Wound up even building a newer version of gcc than came with the latest Kubuntu distribution! Though my findings are not the most exciting subject for a blog, I thought writing them up might help someone.

So I have good news, and bad news…

(more…)

8-Year-Olds Should *Read* My Code

Tuesday, June 16th, 2009

A couple years ago, I read an article that gained popularity on social-bookmarking sites which was entitled “8-year-olds should test my code”. It’s a story about a child named Brian (no relation :P), who crashed UCBLogo only seconds after encountering it for the first time:

Logo Crash Caused By 8-Year-Old

The author is an engineer at Google, and said this:

“I had played with UCBLogo for two weeks and hadn’t made it crash once. Brian brought the whole thing down in three commands. The most telling part is that when I tried to reproduce the defect a week later I couldn’t. I issued rt with a ton of 9s and just couldn’t get it to break. As it turns, it only crashes when you omit the space, which of course I didn’t think of doing. It took me more time to reproduce the defect than it took Brian to discover it.”

We’re offered the conclusion that we need legions of 8-year old testers, since their lack of preconceptions makes them great sources of unanticipated input. I strongly disagree.

For one thing, automated fuzz testing can be made much more genuinely random. But more importantly: 8-year-olds have better things to do than feed random data into programs that were developed using defective methods! It’s much more gratifying if kids are using solid software tools that enable creativity and learning. Even better is if their curiosity about the tool can be satisfied by reading its implementation!

This is not as unattainable as it sounds. I’ll go deeper into this example to make my case…by showing what caused this bug and how far ahead modern techniques are.

(more…)

Modern C++… or Modern Art?

Tuesday, March 31st, 2009

In the preface to his book Modern C++, Andrei Alexandrescu paints a vision of what programming should be like:

“Imagine the following scenario. You come from a design meeting with a couple of printed diagrams, scribbled with your annotations. Okay, the event type passed between these objects is not char anymore; it’s int. You change one line of code. The smart pointers to Widget are too slow; they should go unchecked. You change one line of code. The object factory needs to support the new Gadget class just added by another department. You change one line of code.

You have changed the design. Compile. Link. Done.”

This is a very nice theory. But as C++ programming has remained relevant only among a small (yet important) “fringe” of developers, they have been flexing the standards toward an uncompromising pursuit of this vision. The results are somewhat extreme and not generally easy to work with.

In this article I will talk briefly about the what is happening and what I think of the aesthetics.

(more…)

In Defense of Hungarian Notation (with caveats)

Saturday, September 24th, 2005

Anyone who has programmed directly to the Windows API knows about the existence of Hungarian Notation. It is a way of making the name of a variable or procedure flow automatically from its data type. Like other conventions that have been rejected by the general programming community, it would be foolish to use it today on any public API or code example. Despite this, I do still borrow from some of the “spirit” of the notation when I code.

I’d like to explain why.

Some critics (and adherents) of Hungarian Notation think its goal is to encode useful type information into names. The truth is, coming up with useful names is not the point. It is much more about avoiding the encoding of useless information!

Names—like indentation, spacing, and comments—do not affect the executable code. For instance, look at this code:

void DestroyTheWindow(HWND TheWindowIWillDestroy)
   // This function should only be called on windows which
   // have no parents; if you would like to destroy a
   // window which has a parent, then you must destroy 
   // the window through the parent. Failure to do so
   // will skip essential window layer cleanup routines
   // and a later crash may ensue.
   {
   ...
   }

No matter how nice the prose, that programmer spent their finite allocation of time on earth unwisely. Given the 30 seconds (at least!) it took them to write that comment, I’d rather they had written:

void DestroyWindow(HWND hwnd)
   {
   assert(GetParent(hwnd) == NULL);
   ...
   }

…or even better, invested that time in creating subclasses of HWND (like HWND_TOPLEVEL and HWND_CHILD) so that the error could be caught at compile-time:

void DestroyToplevel(HWND_TOPLEVEL hwnd)
   {
   ...
   }

Even if HWND_TOPLEVEL and HWND_CHILD are merely typedefs for HWND, I think this is better documentation than a comment in the long run. It conveys all the same information and can easily grow into a compiler-checked solution, if you were to upgrade the definitions from typedefs into distinct classes.

Before anyone revokes my programming license, I’m not saying people shouldn’t comment. Yet a programmer only has twenty-four hours in a day (disregarding sleep, of course)… and therefore any time invested in comments is energy that was not put into fixing the code so that comment wasn’t necessary. Naming is the same way—a creative name is something a user will not be able to appreciate in terms of runtime features.

So to bring our discussion back to the core of Hungarian Notation’s value for your programming mind, let’s look at the situation of someone naming a function parameter:

void DestroyWindow(HWND /*[name goes here]*/)
   {
   ...
   }

I could sit around all day debating whether to call it Input or ToBeDestroyedWindow or TheWindowIWillDestroy or cryptically x. Yet what this variable represents is obvious from the context—after all, it is the sole parameter to a routine called DestroyWindow. It’s the window to destroy (Duh!)

One way of avoiding a meaningless name would be to give the variable some unique number:

void DestroyWindow(HWND noname1231 /* hope this is unique! */)
   {
   ...
   }

In the future, our integrated development environments might be able to manage such numeric identities for declarations “behind the scenes”. This would be a lot like how databases invisibly manage table relationships through primary keys. But so long as we’re directly modifying textual code, humans can’t really match up these numbers while reading. So this is a bad idea.

But what if you made all your type names in capital letters? Then you could turn the type into lowercase letters to produce an available symbol:

void DestroyWindow(HWND hwnd)
   {
   ...
   }

Assuming that turning your type into lowercase doesn’t produce a language keyword, then you’ll always get a legal identifier. Moreover, the name isn’t completely useless: anywhere you see a reference to this “nameless” variable you know at least one thing about it: its type.

Naturally, this method of producing a unique symbol breaks down once you have more than one variable of a specific type in a scope:

HWND hwnd; // topmost window in the Z order
HWND hwnd; // parent window (**ERROR, NAME ALREADY USED**)

Yet if there’s more than one variable of the same type in a scope, that means that context alone is by definition insufficient to explain your variable’s purpose. Hungarian Notation prescribes adding a disambiguating mixed-case phrase to the end of the name. It is especially efficient, because the disambiguation always tacks onto the end of names—which is easy to add and remove in the editor if you ever run up against a collision:

HWND hwndTopmostInZOrder;
HWND hwndParent;

This only works if you create lots of new types. If you have an integer value, and you prefix a variable with the letter “i”, context is probably not sufficient to know what the variable is for. So that begs the question: why are you using an integer and not a higher level abstraction like “line number” (LINE), “count of employees” (CEMP), or a “stack depth counter” (SDC)?

If the Y2K problem taught us anything, it’s that you should be very liberal in creating new types which capture your ideas—even if it’s just a measly preprocessor macro. Compare:

BYTE todays_date_in_MM_DD_YY[3];

with:

#define DATE BYTE[3]
DATE date;

Even though the implementations are isomorphic, the second pattern is far better. There is no automatic way to find all the dates in the first example…you have to do manual inspection of all the names of BYTE arrays to figure out which are dates and which are not. This is why I don’t hesitate to create new types while I program, even wrappers for basic types like int or long. Once you’ve done that, you can feel good about variable declarations like LINE lineFirst; or CEMP cempHiredLastMonth; or just SDC sdc;.

If you aren’t making tons of types, and just sticking “i” or “l” in front of everything, forget about Hungarian. It will be useless, and makes what Linus Torvalds says absolutely true:

“Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged - the compiler knows the types anyway and can check those, and it only confuses the programmer.”

I’m not surprised Hungarian Notation has a terrible reputation, because I’ve never seen published code that used it right. Microsoft even screwed it up in their most public APIs! Just look at the definition for window procedures:

long FAR PASCAL WndProc(
   HWND hwnd,
   UINT msg,
   UINT wParam,
   LONG lParam);

The only part they did right is the HWND. The rest is a complete mess. A more genuine attempt would probably look like:

typedef UINT WPARAM;
typedef LONG LPARAM;
typedef LONG LRESULT;
typedef UINT WM; // (W)indows (M)essage
 
LRESULT FAR PASCAL LresultWndproc(
   HWND hwnd,
   WM wm,
   WPARAM wparam,
   LPARAM lparam);

Some very reasonable people suggest that since so few programmers have grasped the “true” spirit of Hungarian Notation, something must be inherently confusing about it. Therefore nearly everyone should avoid it.

I mostly agree.

Yet I simply can’t stand being forced to give a meaningless name to something which is obvious from its context. That’s why I came up with a compromise. I simply make a mental association of a shorthand for each data type that I’m using (such as “WND” for System.Window) without changing the definition. This means code starts to look like:

void DestroyWindow(System.Window wnd)
   {
   System.Window wndParent = wnd.GetParent();
   System.Window wndTemp;
   foreach (wndTemp in wndParent.Children())
      {
      if (wndTemp == wnd)
         ...
      }
   }

This gives me what I desire, and avoids the taboo of data types that are in all capital letters. (I don’t know why, but people have reserved all caps for naming constants. It’s a convention that never made sense to me—mixed case constants are more readable. Oh well.)

One quirk I have adopted is to use “is” as the prefix for booleans. I still think it’s in the spirit of Hungarian, and boolean isTheUserOnline reads a bit better than boolean boolUserOnline. Here’s another example demonstrating my naming technique:

void KickUserOffline(UserObject userKick, UserObject userAdministrator)
   {
   assert(userAdministrator.HasPrivilege(PRIVILEGE_KICK));
   if (userKick.isTheUserOnline)
      {
      userKick.Message("You've been kicked off.");
      userKick.Logout();
      }
   }

I hope I’ve presented this clearly. The code examples that appear on HostileFork.com will use this technique where possible, and I welcome feedback if someone has a better way.


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported