Programmers typically assume that a MouseUp() message won’t happen unless they had previously received a MouseDown()… and that these signals will come in precise pairs. Yet in almost every modern system that runs multiple applications at once, you will hit edge cases that send your program a MouseUp() when no MouseDown() ever happened—or two MouseDown() messages in a row.
There is no well-studied example of how to write your mouse handling code in a way that accounts for these cases. As a consequence, merely task-switching while still in the middle of a mouse operation will send a lot of applications into unexpected conditions! Many will crash or assert when you return… and those that don’t crash often do something bizarre. So I thought I would make a screencast of a prototype I made in 2002 which insulates programmers from these concerns. Even if you’re not a programmer and don’t want to read the whole article about the implementation, you might think the feature itself is unique, so check it out!
(Note: I use alt-tab to take the focus away in mid-mouse stroke, because that was easy to choreograph. But of course the technique is more compelling when an application in the background jumps forward and “steals” the focus. )
As you can see, my library took control of the mouse message pump and reduced the concerns of the programmer. If a mouse gesture was interrupted somehow I didn’t cancel the operation (nor did I just pretend the mouse button had been released and commit it). Instead, the library put the application into a suspended state with a placeholder icon at the last known mouse position. Clicking inside the placeholder restores your cursor to the previous coordinates and resumes the mouse operation, while pressing escape lets you abort.
Use Command objects that only run on MouseUp
In the past I’ve written about the importance of designing your program’s command processor in such a way that undo and redo operate consistently. One of the rules I mentioned was that your command processor shouldn’t be modifying the user’s document on MouseDown() or MouseMove(), but should accumulate the state in a Command object that is only submitted to your Undo/Redo queue when the MouseUp() is finally reached.
The good news is that if you’ve done that part right, then most of the necessary support work for this feature is already done!
A Command object not only makes things clearer for undo and redo—it also gives you a fantastic way of holding a mouse operation in “suspended animation”. The counterintuitive aspect is that pending commands must participate in the rendering process—since the document’s state alone is not enough to draw the view.
Build on Drag&Drop APIs, not mouse messages
One issue that I really had to grapple with was how to retain some control of the mouse cursor even when it had left my application. Sadly, running SetCapture() on Windows still means that the cursor will turn into the default arrow after the focus is lost. The way I found to work around this on Windows was with the Drag&Drop APIs—which turned out to be “tighter” than the default mouse API.
On Windows and other platforms, the Drag&Drop methods are precise analogues to the mouse messages we are familiar with:
Windows Explorer has a rather odd quirk when you rename a file. If you happen to have several items selected, all those selected files are given the exact same name. I don’t like the behavior and it only ever happens to me on accident, but Microsoft documented it… so I’ll let it slide for now.
What I *will* complain about is what happens when I try to undo one of these multi-renames. Despite only running one command, you have to press undo multiple times to restore your state! You have to actually hit undo for each file you had selected. (Adding injury to insult, Windows only keeps 10 undo items at a time—so you *can’t* get back to your initial state if you had 11 or more files selected.)
I demonstrate this defect for you in the video below. Windows Explorer isn’t the only program that does this, so after the video I’ve written about how we can architect our software so that it will never require more than one undo per “user command”:
Why would this happen?
To understand why this happens, you have to know a little bit about how a typical undo manager works. Most applications simply have a list of objects representing recent commands that have been executed. These “Command” objects usually have the following methods:
“Run” (or “Do”): to execute the command initially, while storing enough information inside the command object that it may revert the effect
“Undo”: remove the changes the command made by using the stored information (assuming the relevant application state is in the precise condition after the command was finished)
“Redo”: bring back the effects of the command using the stored information (assuming the relevant application state is in the precise condition as when the command was initially run)
It might seem that “Run” and “Redo” are redundant, but they actually perform different functions. Just imagine a command that inserts the current time into your word processing document. If someone runs this at 4:00 and then undoes it at 5:00, they expect a redo moments later to restore the “4:00″ they just undid…not inject “5:01″! The undo and redo methods merely playback the action data that was stored during the command—by design.
Once you understand that a simple command processor will only undo one “Command” at a time, it becomes easy to intuit what Windows Explorer’s problem must be. Somewhere in the shell there is code which looks vaguely like this:
Then when it came time to implement the multiple renaming facility, the Microsoft programmers didn’t modify RenameCommand to take a list of files. They simply told the command processor to invoke several of them in series:
Yet exposing an API like this from your command processor will not protect against the bug in any general sense. What will solve the general problem is if you only permit a few spots in the main UI loop to access BeginUndoGroup() and EndUndoGroup(). Those special places are the points where the user triggered an event that they consider “conceptually atomic”. Some good examples of these privileged moments are:
a key press or running an accelerator key
the selection of a menu item
the pushing of a toolbar button
the release of the mouse after a dragging operation
Obviously each individual mouse movement shouldn’t generate an undo group—in a paint program you wouldn’t want to have to undo each pixel from a brush stroke! Yet simply calling BeginUndoGroup() when the mouse goes down and then EndUndoGroup() when the mouse goes up isn’t the ideal solution. The problem is better solved by not allowing the mouse action to submit a command to the command processor until the mouse button is released. Until that time, the program just accumulates state from the mouse’s movement that will ultimately be used by the command’s Run() method.
There are so many benefits to deferring the calls to BeginUndoGroup() and EndUndoGroup() until the mouse button is released that I’d have a hard time condensing them all here. The features this enables warrant articles of their own! Savvy GUI developers can probably guess that most of the practical benefits relate to not having to pump UI messages while the command processor is inside a “transaction”. Yet there are more fun results, such as the ability to gracefully suspend a mouse operation when an application loses focus.
Protecting against stray document modifications
One way to get even more power out of this architectural pattern is to add runtime checks to ensure that none of your documents can be written to unless a BeginUndoGroup() is in effect. This way you protect yourself from writing a program that persistently modifies a user’s document while they are merely hovering over the application’s window, or running in the idle loop. During these times there would be no user event with which to associate the effects, so the undo behavior is going to seem random.
I know many programs as they are currently written would choke on these strict rules—but look closely and ask yourself how you expected the undo/redo to work otherwise? You probably are just hiding bugs very much like the one in Explorer above. Ensuring that a user-motivated undo group is always in effect before invoking a command with the power to make document modifications will protect against a number of awkward scenarios.
(Note for C++ programmers: to take this even farther and protect against stray document modifications at compile-time, try my suggestion of “extreme” use of const’s transitive power! If the only place your application gives out non-const pointers is as a parameter to the Command.Run() method, you guard against all kinds of accidents.)
I do want to add in the caveat that there are probably scenarios where you want to hit undo fewer times than the number of user events. In Microsoft Word, typing a sentence and hitting Ctrl-Z will remove the whole sentence—and that’s sometimes what the user wants. Yet even in these cases of providing higher-order undo commands (which I approve of), the user should still always be allowed able to undo on a per-operation basis.
Also, there is no way to avoid the multiple-undo situation when a third-party software tool is directly manipulating the user interface of an application on a user’s behalf. A devil’s advocate might even argue that the bug in the Windows Shell happens because renaming multiple files isn’t an intrinsic function of Windows Explorer, but rather a convenience provided by a separate “Windows Shell Extension Tool”. Yet this is hogwash, since any system advertising an extensible architecture should be able to handle those plug-ins gracefully within its undo model.
In summary: I am convinced that requiring undo more times than the number of user events is a sign of poor design. Your undo/redo model will be clean and solid if you manage your undo groupings according to the guidelines above. I’d love to hear any success or failure stories people have of working with this approach, so please comment.