Archive for the ‘Philosophy’ Category

Computer Languages as Artistic Medium

Wednesday, April 23rd, 2008

Not many people have the ability to look at the few fundamental elements of a system (such as constructs of a new computer language) and extrapolate the long-term implications of their use. But rather than jump in and tackle the difficult question of understanding the expressive power and limits of computation, I’ll start by talking about a simpler problem. Don’t panic: it involves crayons and coloring!

This example is going to seem pretty easy on the surface, and you may be familiar with it already. It involves trying to find a way to color in maps so that no areas which share a border are going to be the same color. Here’s an example of a simple coloring with 5 crayons, and then finding a method for doing it with only 4 (taken from Wikipedia’s article on the subject):

fourcolor_wikipedia

Without prior knowledge, the average Joe can only guess at how many crayons he’d need to reliably fill in any map he is given. Just by trying a few cases, he might get an intuition that a small box of different colors will do the trick. But if Joe knew the Devil was going to be presenting him with an arbitrary map and getting his soul if he couldn’t color it properly, he’d probably buy Crayola’s biggest box…120 colors! (Plus, because he doesn’t know for sure that even THAT would be enough, he’d also probably try to round up the discontinued colors off of eBay!)

Yet there is salvation from this kind of paranoia. Early formalisms of the map-coloring-problem from mathematicians comforted us with proof that it *never* took more than 5 different colors…and that 3 will certainly not always be enough. After considerable debate and experimentation over the years, we ultimately found that we only need four unique colors for any map the Devil—or anyone else—makes.

(Note: Depending on your purposes, discovering it was 4 vs. 5 might be somewhat academic. I myself feel that the more crucial step was bringing in math to save us from needing to pack a potentially infinite number of crayons.)

Now…what if the “Devil’s Map” you must solve is instead an arbitrary selection from mankind’s constant creation of new ideas for software? That raises a lot of questions about what a programmer needs in their backpack to be confident they could correctly “color in” any program specification with a working implementation.

(more…)

Tying Undo/Redo Actions to a Single User Event

Sunday, November 25th, 2007

Windows Explorer has a rather odd quirk when you rename a file. If you happen to have several items selected, all those selected files are given the exact same name. I don’t like the behavior and it only ever happens to me on accident, but Microsoft documented it… so I’ll let it slide for now.

What I *will* complain about is what happens when I try to undo one of these multi-renames. Despite only running one command, you have to press undo multiple times to restore your state! You have to actually hit undo for each file you had selected. (Adding injury to insult, Windows only keeps 10 undo items at a time—so you *can’t* get back to your initial state if you had 11 or more files selected.)

I demonstrate this defect for you in the video below. Windows Explorer isn’t the only program that does this, so after the video I’ve written about how we can architect our software so that it will never require more than one undo per “user command”:

Why would this happen?

To understand why this happens, you have to know a little bit about how a typical undo manager works. Most applications simply have a list of objects representing recent commands that have been executed. These “Command” objects usually have the following methods:

  1. “Run” (or “Do”): to execute the command initially, while storing enough information inside the command object that it may revert the effect
  2. “Undo”: remove the changes the command made by using the stored information (assuming the relevant application state is in the precise condition after the command was finished)
  3. “Redo”: bring back the effects of the command using the stored information (assuming the relevant application state is in the precise condition as when the command was initially run)

It might seem that “Run” and “Redo” are redundant, but they actually perform different functions. Just imagine a command that inserts the current time into your word processing document. If someone runs this at 4:00 and then undoes it at 5:00, they expect a redo moments later to restore the “4:00″ they just undid…not inject “5:01″! The undo and redo methods merely playback the action data that was stored during the command—by design.

Once you understand that a simple command processor will only undo one “Command” at a time, it becomes easy to intuit what Windows Explorer’s problem must be. Somewhere in the shell there is code which looks vaguely like this:

class RenameCommand : public Command
  {
  string filenameNew;
  string filenameOld;
  FileHandle fh;
 
  RenameCommand(FileHandle fh, string filenameNew)
    {
    this->fh = fh;
    this->filenameNew = filenameNew;
    }
 
  void Run()
    {
    filenameOld = fh->GetCurrentName();
    fh->SetName(filenameNew);
    }
 
  void Undo()
    {
    assert(fh->GetCurrentName() == filenameNew);
    fh->SetName(filenameOld);
    }
 
  void Redo()
    {
    assert(fh->GetCurrentName() == filenameOld);
    fh->SetName(filenameNew);
    }
  };

Then when it came time to implement the multiple renaming facility, the Microsoft programmers didn’t modify RenameCommand to take a list of files. They simply told the command processor to invoke several of them in series:

string filenameNew = Shell.GetNameFromUser();
FileHandle fh;
foreach (fh, Shell.GetSelectedFileHandles())
  {
  Command* cmd = new RenameCommand(fh, fileNameNew);
  commandProcessor.RunUndoableCommand(cmd);
  }

This is where they get into trouble. They’ve added multiple undo items to the command processor’s list for what conceptually should have been a single command.

Tying transactions directly to user events

A naive workaround would be to add the idea of a transaction that encapsulates multiple commands into an “undo group”:

commandProcessor.BeginUndoGroup();
 
string nameNew = Shell.GetNameFromUser();
FileHandle fh;
foreach (fh, Shell.GetSelectedFileHandles())
  {
  Command* cmd = new RenameCommand(fh, newName);
  commandProcessor.RunCommand(cmd);
  }
 
commandProcessor.EndUndoGroup();

Yet exposing an API like this from your command processor will not protect against the bug in any general sense. What will solve the general problem is if you only permit a few spots in the main UI loop to access BeginUndoGroup() and EndUndoGroup(). Those special places are the points where the user triggered an event that they consider “conceptually atomic”. Some good examples of these privileged moments are:

  1. a key press or running an accelerator key
  2. the selection of a menu item
  3. the pushing of a toolbar button
  4. the release of the mouse after a dragging operation

Obviously each individual mouse movement shouldn’t generate an undo group—in a paint program you wouldn’t want to have to undo each pixel from a brush stroke! Yet simply calling BeginUndoGroup() when the mouse goes down and then EndUndoGroup() when the mouse goes up isn’t the ideal solution. The problem is better solved by not allowing the mouse action to submit a command to the command processor until the mouse button is released. Until that time, the program just accumulates state from the mouse’s movement that will ultimately be used by the command’s Run() method.

There are so many benefits to deferring the calls to BeginUndoGroup() and EndUndoGroup() until the mouse button is released that I’d have a hard time condensing them all here. The features this enables warrant articles of their own! Savvy GUI developers can probably guess that most of the practical benefits relate to not having to pump UI messages while the command processor is inside a “transaction”. Yet there are more fun results, such as the ability to gracefully suspend a mouse operation when an application loses focus.

Protecting against stray document modifications

One way to get even more power out of this architectural pattern is to add runtime checks to ensure that none of your documents can be written to unless a BeginUndoGroup() is in effect. This way you protect yourself from writing a program that persistently modifies a user’s document while they are merely hovering over the application’s window, or running in the idle loop. During these times there would be no user event with which to associate the effects, so the undo behavior is going to seem random.

I know many programs as they are currently written would choke on these strict rules—but look closely and ask yourself how you expected the undo/redo to work otherwise? You probably are just hiding bugs very much like the one in Explorer above. Ensuring that a user-motivated undo group is always in effect before invoking a command with the power to make document modifications will protect against a number of awkward scenarios.

(Note for C++ programmers: to take this even farther and protect against stray document modifications at compile-time, try my suggestion of “extreme” use of const’s transitive power! If the only place your application gives out non-const pointers is as a parameter to the Command.Run() method, you guard against all kinds of accidents.)

Final thoughts

I do want to add in the caveat that there are probably scenarios where you want to hit undo fewer times than the number of user events. In Microsoft Word, typing a sentence and hitting Ctrl-Z will remove the whole sentence—and that’s sometimes what the user wants. Yet even in these cases of providing higher-order undo commands (which I approve of), the user should still always be allowed able to undo on a per-operation basis.

Also, there is no way to avoid the multiple-undo situation when a third-party software tool is directly manipulating the user interface of an application on a user’s behalf. A devil’s advocate might even argue that the bug in the Windows Shell happens because renaming multiple files isn’t an intrinsic function of Windows Explorer, but rather a convenience provided by a separate “Windows Shell Extension Tool”. Yet this is hogwash, since any system advertising an extensible architecture should be able to handle those plug-ins gracefully within its undo model.

In summary: I am convinced that requiring undo more times than the number of user events is a sign of poor design. Your undo/redo model will be clean and solid if you manage your undo groupings according to the guidelines above. I’d love to hear any success or failure stories people have of working with this approach, so please comment.

Bribing developers to make their work free

Saturday, October 20th, 2007

Some developers of closed-source/commercial software are holding onto the idea that a program they have written is going to make them a lot of money in the future. Yet by and large, many small software projects—even very good ones—will not ever make money if they are held on to tightly. Some do make a few hundred dollars a month, but it’s usually a fantasy to believe that it will make the developer rich and famous. This fantasy keeps them from sharing their source and possibly merging ideas with other similar codebases and generating something even better for the public.

One possibility would be to convince the developer of a piece of good software to hand over their work by giving them a lump sum. This sum may be less than the hoped-for long-term revenue in the dreams of the recipient, but sufficient to fund the hours of their hobby, and offset the loss of the residual income. This has happened on at least one instance I know about. For 100,000 euros, the makers of Blender were willing to transfer ownership of their project to the free software community rather than let the project disappear when the company went out of business.

Can more people be paid off? If so, one helpful tool would be a project called Fundable. It lets you pledge funds to a pool for a specific purpose—and then the funds you donate are only billed in the event that the total donation goal is reached.

Something that might help people be more willing to offer money would be if developers presented a reasoned rationale for why they needed it. Discussing one’s budget in an open way and putting it under the scrutiny of investors may not appeal to most independent developers. But the experiment has been conducted by at least one individual—Jason Rohrer—who has disclosed his personal expenses and concluded that it would take less than $1000 a month to allow him to pursue free software development full time.

It’s not clear that he has been successful in this, but it’s a brave idea. I do not personally use the software he develops and so I am hard pressed to determine if this represents a good value proposition compared to other efforts. But it’s a fascinating precedent that might make people more willing to donate to buy public rights to the code of a particular developer (especially if it’s on an ongoing basis where their commitment to doing further work can be assessed).

In Defense of Hungarian Notation (with caveats)

Saturday, September 24th, 2005

Anyone who has programmed directly to the Windows API knows about the existence of Hungarian Notation. It is a way of making the name of a variable or procedure flow automatically from its data type. Like other conventions that have been rejected by the general programming community, it would be foolish to use it today on any public API or code example. Despite this, I do still borrow from some of the “spirit” of the notation when I code.

I’d like to explain why.

Some critics (and adherents) of Hungarian Notation think its goal is to encode useful type information into names. The truth is, coming up with useful names is not the point. It is much more about avoiding the encoding of useless information!

Names—like indentation, spacing, and comments—do not affect the executable code. For instance, look at this code:

void DestroyTheWindow(HWND TheWindowIWillDestroy)
   // This function should only be called on windows which
   // have no parents; if you would like to destroy a
   // window which has a parent, then you must destroy 
   // the window through the parent. Failure to do so
   // will skip essential window layer cleanup routines
   // and a later crash may ensue.
   {
   ...
   }

No matter how nice the prose, that programmer spent their finite allocation of time on earth unwisely. Given the 30 seconds (at least!) it took them to write that comment, I’d rather they had written:

void DestroyWindow(HWND hwnd)
   {
   assert(GetParent(hwnd) == NULL);
   ...
   }

…or even better, invested that time in creating subclasses of HWND (like HWND_TOPLEVEL and HWND_CHILD) so that the error could be caught at compile-time:

void DestroyToplevel(HWND_TOPLEVEL hwnd)
   {
   ...
   }

Even if HWND_TOPLEVEL and HWND_CHILD are merely typedefs for HWND, I think this is better documentation than a comment in the long run. It conveys all the same information and can easily grow into a compiler-checked solution, if you were to upgrade the definitions from typedefs into distinct classes.

Before anyone revokes my programming license, I’m not saying people shouldn’t comment. Yet a programmer only has twenty-four hours in a day (disregarding sleep, of course)… and therefore any time invested in comments is energy that was not put into fixing the code so that comment wasn’t necessary. Naming is the same way—a creative name is something a user will not be able to appreciate in terms of runtime features.

So to bring our discussion back to the core of Hungarian Notation’s value for your programming mind, let’s look at the situation of someone naming a function parameter:

void DestroyWindow(HWND /*[name goes here]*/)
   {
   ...
   }

I could sit around all day debating whether to call it Input or ToBeDestroyedWindow or TheWindowIWillDestroy or cryptically x. Yet what this variable represents is obvious from the context—after all, it is the sole parameter to a routine called DestroyWindow. It’s the window to destroy (Duh!)

One way of avoiding a meaningless name would be to give the variable some unique number:

void DestroyWindow(HWND noname1231 /* hope this is unique! */)
   {
   ...
   }

In the future, our integrated development environments might be able to manage such numeric identities for declarations “behind the scenes”. This would be a lot like how databases invisibly manage table relationships through primary keys. But so long as we’re directly modifying textual code, humans can’t really match up these numbers while reading. So this is a bad idea.

But what if you made all your type names in capital letters? Then you could turn the type into lowercase letters to produce an available symbol:

void DestroyWindow(HWND hwnd)
   {
   ...
   }

Assuming that turning your type into lowercase doesn’t produce a language keyword, then you’ll always get a legal identifier. Moreover, the name isn’t completely useless: anywhere you see a reference to this “nameless” variable you know at least one thing about it: its type.

Naturally, this method of producing a unique symbol breaks down once you have more than one variable of a specific type in a scope:

HWND hwnd; // topmost window in the Z order
HWND hwnd; // parent window (**ERROR, NAME ALREADY USED**)

Yet if there’s more than one variable of the same type in a scope, that means that context alone is by definition insufficient to explain your variable’s purpose. Hungarian Notation prescribes adding a disambiguating mixed-case phrase to the end of the name. It is especially efficient, because the disambiguation always tacks onto the end of names—which is easy to add and remove in the editor if you ever run up against a collision:

HWND hwndTopmostInZOrder;
HWND hwndParent;

This only works if you create lots of new types. If you have an integer value, and you prefix a variable with the letter “i”, context is probably not sufficient to know what the variable is for. So that begs the question: why are you using an integer and not a higher level abstraction like “line number” (LINE), “count of employees” (CEMP), or a “stack depth counter” (SDC)?

If the Y2K problem taught us anything, it’s that you should be very liberal in creating new types which capture your ideas—even if it’s just a measly preprocessor macro. Compare:

BYTE todays_date_in_MM_DD_YY[3];

with:

#define DATE BYTE[3]
DATE date;

Even though the implementations are isomorphic, the second pattern is far better. There is no automatic way to find all the dates in the first example…you have to do manual inspection of all the names of BYTE arrays to figure out which are dates and which are not. This is why I don’t hesitate to create new types while I program, even wrappers for basic types like int or long. Once you’ve done that, you can feel good about variable declarations like LINE lineFirst; or CEMP cempHiredLastMonth; or just SDC sdc;.

If you aren’t making tons of types, and just sticking “i” or “l” in front of everything, forget about Hungarian. It will be useless, and makes what Linus Torvalds says absolutely true:

“Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged - the compiler knows the types anyway and can check those, and it only confuses the programmer.”

I’m not surprised Hungarian Notation has a terrible reputation, because I’ve never seen published code that used it right. Microsoft even screwed it up in their most public APIs! Just look at the definition for window procedures:

long FAR PASCAL WndProc(
   HWND hwnd,
   UINT msg,
   UINT wParam,
   LONG lParam);

The only part they did right is the HWND. The rest is a complete mess. A more genuine attempt would probably look like:

typedef UINT WPARAM;
typedef LONG LPARAM;
typedef LONG LRESULT;
typedef UINT WM; // (W)indows (M)essage
 
LRESULT FAR PASCAL LresultWndproc(
   HWND hwnd,
   WM wm,
   WPARAM wparam,
   LPARAM lparam);

Some very reasonable people suggest that since so few programmers have grasped the “true” spirit of Hungarian Notation, something must be inherently confusing about it. Therefore nearly everyone should avoid it.

I mostly agree.

Yet I simply can’t stand being forced to give a meaningless name to something which is obvious from its context. That’s why I came up with a compromise. I simply make a mental association of a shorthand for each data type that I’m using (such as “WND” for System.Window) without changing the definition. This means code starts to look like:

void DestroyWindow(System.Window wnd)
   {
   System.Window wndParent = wnd.GetParent();
   System.Window wndTemp;
   foreach (wndTemp in wndParent.Children())
      {
      if (wndTemp == wnd)
         ...
      }
   }

This gives me what I desire, and avoids the taboo of data types that are in all capital letters. (I don’t know why, but people have reserved all caps for naming constants. It’s a convention that never made sense to me—mixed case constants are more readable. Oh well.)

One quirk I have adopted is to use “is” as the prefix for booleans. I still think it’s in the spirit of Hungarian, and boolean isTheUserOnline reads a bit better than boolean boolUserOnline. Here’s another example demonstrating my naming technique:

void KickUserOffline(UserObject userKick, UserObject userAdministrator)
   {
   assert(userAdministrator.HasPrivilege(PRIVILEGE_KICK));
   if (userKick.isTheUserOnline)
      {
      userKick.Message("You've been kicked off.");
      userKick.Logout();
      }
   }

I hope I’ve presented this clearly. The code examples that appear on HostileFork.com will use this technique where possible, and I welcome feedback if someone has a better way.

“Freedom To” and “Freedom From” in Software Architecture

Monday, July 4th, 2005

A lot of programmers are drawn to the field of computer science precisely because there were no rules restricting the worlds they created. In some sense, we do rely upon the power of “being able to do anything” in order to unleash the creative and technical power of computers for solving problems in new ways. Yet experienced developers generally realize that choosing a development platform with more narrow capabilities can actually help focus the problem and produce a better product—and that those restrictions are there for their own good!

This seeming paradox is actually a pervasive Zen Truth, which shows up in every area of philosophical inquiry. I like to borrow terminology from the political theorists, who have articulated the two types of freedom: “freedom to” and “freedom from”. You might say that each restriction placed on a software developer is a limitation of their “freedom to”, which provides someone else (another developer or the user themselves) a “freedom from”. The classic statement “your right to swing your arm ends where my nose begins” captures the importance of balancing the power of the individual and the guarantees offered to others.

The balance comes into play no matter how you design a system; you cannot “opt-out” of the decisions! Any failure on your part to put a restriction in place is an implicit declaration that everything is permitted. That may have been an expedient choice in the Wild West of computer history, but software is now a tightly intertwined urban landscape…where lawlessness breeds chaos. Although security restrictions are experiencing a boon due to the prevalence of hackers, very little emphasis is currently being placed on imposing clever restrictions on an application’s internal structure. If your computer were a country, it’s as if there are detailed immigration policies but no one is bothering to enforce traffic laws!

When I think about most of the tools and frameworks that are being marketed today, I’m reminded of the story of Stone Soup. I’ll paraphrase the fable like this:

A mischevious individual sits with a stone and some water in a pot. Curious villagers come to see what’s being made, and the stranger tells them that it’s “stone soup”. The people doubt that you can really make soup from a stone, but they sit and wait to watch and see if anything happens.

Man serving stone soup

As the time passes, the chef mourns the fact that there’s no salt, because that would make the stone soup “much better”. Someone runs home to get salt to put in the pot. Then the chef mentions that there are no carrots, which is too bad since carrots really give that extra “oomph” to stone soup. That leads another person to run to their garden to get the ingredient, and this pattern is repeated a few times. Eventually it makes a great pot of soup—leaving the villagers in awe of the magic stone’s power.

Usually the man with the stone is seen as a hero for getting the hungry villagers to share their ingredients with the community instead of hoarding them. Thus the moral of the story is that when you catalyze a group so that everyone contributes what they can, a greater good is achieved. But I always felt like a more obvious moral was important:

If you provide all the seasonings, vegetables, and meat then you will have soup—stone or no stone!

To recast this lesson into engineering terminology:

Projects can end up succeeding in spite of infrastructure…rather than because of it!

Like these hungry villagers, programming communities often achieve outstanding results by using tools which have no more to do with their success than the stone. In fact, software developers are especially vulnerable to stone soup salesmen because of the discoveries often attributed to Alan Turing. Since computing is universal, even bad tools can’t stop you from solving any problem you need to. The only place new computing “power” comes into the picture is when a platform clearly defines which problems it doesn’t solve—so that it may be closer fit to the problems you are interested in.

Thus, to truly find advantage in a new platform, it must be very specific about the limitations it created to reshape its environment in a manner more catalytic than the Universal Computation stone!


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported