Archive for November, 2007

Mouse Placeholders for when Programs Lose Focus

Sunday, November 25th, 2007

Programmers typically assume that a MouseUp() message won’t happen unless they had previously received a MouseDown()… and that these signals will come in precise pairs. Yet in almost every modern system that runs multiple applications at once, you will hit edge cases that send your program a MouseUp() when no MouseDown() ever happened—or two MouseDown() messages in a row.

There is no well-studied example of how to write your mouse handling code in a way that accounts for these cases. As a consequence, merely task-switching while still in the middle of a mouse operation will send a lot of applications into unexpected conditions! Many will crash or assert when you return… and those that don’t crash often do something bizarre. So I thought I would make a screencast of a prototype I made in 2002 which insulates programmers from these concerns. Even if you’re not a programmer and don’t want to read the whole article about the implementation, you might think the feature itself is unique, so check it out!

(Note: I use alt-tab to take the focus away in mid-mouse stroke, because that was easy to choreograph. But of course the technique is more compelling when an application in the background jumps forward and “steals” the focus. )

As you can see, my library took control of the mouse message pump and reduced the concerns of the programmer. If a mouse gesture was interrupted somehow I didn’t cancel the operation (nor did I just pretend the mouse button had been released and commit it). Instead, the library put the application into a suspended state with a placeholder icon at the last known mouse position. Clicking inside the placeholder restores your cursor to the previous coordinates and resumes the mouse operation, while pressing escape lets you abort.

Use Command objects that only run on MouseUp

In the past I’ve written about the importance of designing your program’s command processor in such a way that undo and redo operate consistently. One of the rules I mentioned was that your command processor shouldn’t be modifying the user’s document on MouseDown() or MouseMove(), but should accumulate the state in a Command object that is only submitted to your Undo/Redo queue when the MouseUp() is finally reached.

The good news is that if you’ve done that part right, then most of the necessary support work for this feature is already done!

A Command object not only makes things clearer for undo and redo—it also gives you a fantastic way of holding a mouse operation in “suspended animation”. The counterintuitive aspect is that pending commands must participate in the rendering process—since the document’s state alone is not enough to draw the view.

Build on Drag&Drop APIs, not mouse messages

One issue that I really had to grapple with was how to retain some control of the mouse cursor even when it had left my application. Sadly, running SetCapture() on Windows still means that the cursor will turn into the default arrow after the focus is lost. The way I found to work around this on Windows was with the Drag&Drop APIs—which turned out to be “tighter” than the default mouse API.

On Windows and other platforms, the Drag&Drop methods are precise analogues to the mouse messages we are familiar with:

  1. DragStart() = MouseDown()
  2. DragOver() = MouseMove()
  3. Drop() = MouseUp()

(more…)

Tying Undo/Redo Actions to a Single User Event

Sunday, November 25th, 2007

Windows Explorer has a rather odd quirk when you rename a file. If you happen to have several items selected, all those selected files are given the exact same name. I don’t like the behavior and it only ever happens to me on accident, but Microsoft documented it… so I’ll let it slide for now.

What I *will* complain about is what happens when I try to undo one of these multi-renames. Despite only running one command, you have to press undo multiple times to restore your state! You have to actually hit undo for each file you had selected. (Adding injury to insult, Windows only keeps 10 undo items at a time—so you *can’t* get back to your initial state if you had 11 or more files selected.)

I demonstrate this defect for you in the video below. Windows Explorer isn’t the only program that does this, so after the video I’ve written about how we can architect our software so that it will never require more than one undo per “user command”:

Why would this happen?

To understand why this happens, you have to know a little bit about how a typical undo manager works. Most applications simply have a list of objects representing recent commands that have been executed. These “Command” objects usually have the following methods:

  1. “Run” (or “Do”): to execute the command initially, while storing enough information inside the command object that it may revert the effect
  2. “Undo”: remove the changes the command made by using the stored information (assuming the relevant application state is in the precise condition after the command was finished)
  3. “Redo”: bring back the effects of the command using the stored information (assuming the relevant application state is in the precise condition as when the command was initially run)

It might seem that “Run” and “Redo” are redundant, but they actually perform different functions. Just imagine a command that inserts the current time into your word processing document. If someone runs this at 4:00 and then undoes it at 5:00, they expect a redo moments later to restore the “4:00″ they just undid…not inject “5:01″! The undo and redo methods merely playback the action data that was stored during the command—by design.

Once you understand that a simple command processor will only undo one “Command” at a time, it becomes easy to intuit what Windows Explorer’s problem must be. Somewhere in the shell there is code which looks vaguely like this:

class RenameCommand : public Command
  {
  string filenameNew;
  string filenameOld;
  FileHandle fh;
 
  RenameCommand(FileHandle fh, string filenameNew)
    {
    this->fh = fh;
    this->filenameNew = filenameNew;
    }
 
  void Run()
    {
    filenameOld = fh->GetCurrentName();
    fh->SetName(filenameNew);
    }
 
  void Undo()
    {
    assert(fh->GetCurrentName() == filenameNew);
    fh->SetName(filenameOld);
    }
 
  void Redo()
    {
    assert(fh->GetCurrentName() == filenameOld);
    fh->SetName(filenameNew);
    }
  };

Then when it came time to implement the multiple renaming facility, the Microsoft programmers didn’t modify RenameCommand to take a list of files. They simply told the command processor to invoke several of them in series:

string filenameNew = Shell.GetNameFromUser();
FileHandle fh;
foreach (fh, Shell.GetSelectedFileHandles())
  {
  Command* cmd = new RenameCommand(fh, fileNameNew);
  commandProcessor.RunUndoableCommand(cmd);
  }

This is where they get into trouble. They’ve added multiple undo items to the command processor’s list for what conceptually should have been a single command.

Tying transactions directly to user events

A naive workaround would be to add the idea of a transaction that encapsulates multiple commands into an “undo group”:

commandProcessor.BeginUndoGroup();
 
string nameNew = Shell.GetNameFromUser();
FileHandle fh;
foreach (fh, Shell.GetSelectedFileHandles())
  {
  Command* cmd = new RenameCommand(fh, newName);
  commandProcessor.RunCommand(cmd);
  }
 
commandProcessor.EndUndoGroup();

Yet exposing an API like this from your command processor will not protect against the bug in any general sense. What will solve the general problem is if you only permit a few spots in the main UI loop to access BeginUndoGroup() and EndUndoGroup(). Those special places are the points where the user triggered an event that they consider “conceptually atomic”. Some good examples of these privileged moments are:

  1. a key press or running an accelerator key
  2. the selection of a menu item
  3. the pushing of a toolbar button
  4. the release of the mouse after a dragging operation

Obviously each individual mouse movement shouldn’t generate an undo group—in a paint program you wouldn’t want to have to undo each pixel from a brush stroke! Yet simply calling BeginUndoGroup() when the mouse goes down and then EndUndoGroup() when the mouse goes up isn’t the ideal solution. The problem is better solved by not allowing the mouse action to submit a command to the command processor until the mouse button is released. Until that time, the program just accumulates state from the mouse’s movement that will ultimately be used by the command’s Run() method.

There are so many benefits to deferring the calls to BeginUndoGroup() and EndUndoGroup() until the mouse button is released that I’d have a hard time condensing them all here. The features this enables warrant articles of their own! Savvy GUI developers can probably guess that most of the practical benefits relate to not having to pump UI messages while the command processor is inside a “transaction”. Yet there are more fun results, such as the ability to gracefully suspend a mouse operation when an application loses focus.

Protecting against stray document modifications

One way to get even more power out of this architectural pattern is to add runtime checks to ensure that none of your documents can be written to unless a BeginUndoGroup() is in effect. This way you protect yourself from writing a program that persistently modifies a user’s document while they are merely hovering over the application’s window, or running in the idle loop. During these times there would be no user event with which to associate the effects, so the undo behavior is going to seem random.

I know many programs as they are currently written would choke on these strict rules—but look closely and ask yourself how you expected the undo/redo to work otherwise? You probably are just hiding bugs very much like the one in Explorer above. Ensuring that a user-motivated undo group is always in effect before invoking a command with the power to make document modifications will protect against a number of awkward scenarios.

(Note for C++ programmers: to take this even farther and protect against stray document modifications at compile-time, try my suggestion of “extreme” use of const’s transitive power! If the only place your application gives out non-const pointers is as a parameter to the Command.Run() method, you guard against all kinds of accidents.)

Final thoughts

I do want to add in the caveat that there are probably scenarios where you want to hit undo fewer times than the number of user events. In Microsoft Word, typing a sentence and hitting Ctrl-Z will remove the whole sentence—and that’s sometimes what the user wants. Yet even in these cases of providing higher-order undo commands (which I approve of), the user should still always be allowed able to undo on a per-operation basis.

Also, there is no way to avoid the multiple-undo situation when a third-party software tool is directly manipulating the user interface of an application on a user’s behalf. A devil’s advocate might even argue that the bug in the Windows Shell happens because renaming multiple files isn’t an intrinsic function of Windows Explorer, but rather a convenience provided by a separate “Windows Shell Extension Tool”. Yet this is hogwash, since any system advertising an extensible architecture should be able to handle those plug-ins gracefully within its undo model.

In summary: I am convinced that requiring undo more times than the number of user events is a sign of poor design. Your undo/redo model will be clean and solid if you manage your undo groupings according to the guidelines above. I’d love to hear any success or failure stories people have of working with this approach, so please comment.

Workaround for Firefox 2 Scroll Bar Bug on Mac

Tuesday, November 6th, 2007

Firefox 2 on the Mac has an unfortunate bug (#187435) which makes the scroll bars on lower regions “bleed through” and overwrite the areas above them. On traditional web pages that aren’t doing much tricky stuff, it will be only a minor visual nuisance. But it seriously impacts sites that are pushing the envelope of what HTML layout can do—like Mocha or the Extjs Web Desktop Demonstration:

extjs_firefox_scrollbar_bug

Phil Crosby has suggested that on the server side, it’s possible to adapt one’s code using the overflow:auto attribute. He also suggests it can be tricky to implement:

Working with overflow:auto on elements can be a pain, since your height and width need to be precise, and different browsers need different amounts of margins to prevent triggering the scrollbars. The scrollbar problem was a good chunk of the UI dev work on jjot.com.

So sometimes this method won’t be right for you, and a client-side fix for Firefox 2.0 would help. I have a workaround that patches the Firefox browser using a theme. All you have to do is download this file and drag it into the themes list you get when you go to the Tools->Add Ons menu:

download_arrow_icon aqua_scrollfix_together_icon download aqua_scrollfix_together-1.0-fx.jar

When installation is complete, go back to the menu and and click “Use Theme”. Note: If you have your Appearance palette set with scroll buttons to be “Top and Bottom” style instead of “Together”, then you’ll want to download this instead:

download_arrow_icon aqua_scrollfix_topbottom_icon download aqua_scrollfix_topbottom-1.0-fx.jar

A smattering of other themes you can find on the web will also incidentally work around the problem, but these two are intended for those Mac users who don’t actually want to change the appearance of the browser. They’re exactly the code for default Firefox in every aspect but scrollbars. The only caveats are:

  • The workaround is using a theme, and Firefox doesn’t let themes draw inactive scroll bars differently from active ones.
  • It’s not a perfect pixel match for the Aqua scrollbars—bear in mind, we’re simulating them with quirky CSS. But it’s close enough that most people won’t notice. If there’s enough interest to warrant trying harder to match more exactly, I’ll cross that bridge at that time.

How does it work?

Firefox has the default behavior of delegating the drawing of scroll bars to the operating system. The bug in question arises from a miscommunication about rendering priority. So if you provide enough information in the skin that it can draw scroll bars on its own and not call the OS, the problem just goes away.

Implementing skinned scroll bars requires you to provide bitmaps that serve as the pieces of the scroll bar control. To give you an idea what these pieces look like, let’s examine some of the ones from the MacFox II Aqua theme by Kelly Cunningham, which uses graphics made by Alex W. If you bothered to rename the .jar file to a .zip and unzip it, here’s some of what you’d find in the global / scrollbar / directory:

extjs_firefox_scrollbar_bug

One must take special care because on Mac OS/X, the description of how these parts integrate is in the theme file global / nativescrollbars.css. For all other platforms it is in global / xulscrollbars.css. If you aren’t going to have different behavior on different platforms, then you can set the contents of both those files into a 1-line shortcut that references a central global / scrollbar.css file:

1
@import url("chrome://global/skin/scrollbars.css");

MacFox II doesn’t do this, because it delegates to the OS if you’re on a Mac. This means it still has the bug. So I changed it to always draw the scroll bars, which quickly exposed another problem: the position of the buttons was being assumed to be “Together” on the same side of the thumb, while the graphics were drawn for being on the “Top and Bottom”. (This is something you can set in the Appearance palette of the System Preferences.)

To unmangle the scroll buttons, I used a tip from an article about scrollbar tweaks to position them the way the MacFox theme had expected. Yet I now realized I would need to make two variations—one for each possible user preference. Since I couldn’t find an existing theme with graphics and logic for the modern OS/X curved buttons “Together”, that meant making my own.

I was unfamiliar with CSS, so the files were hard to edit at first. Especially since almost all the themes I looked at contained patterns like this one (from MacFox’s global / scrollbars.css):

87
88
89
90
91
92
93
scrollbarbutton[type="increment"] {
	position: absolute;
	margin: 0px 0px 0px -9px;
	min-width: 24px;
	min-height: 15px;
	background: url("chrome://global/skin/scrollbar/right_cut.png");
}
99
100
101
102
103
104
105
scrollbar[orient="vertical"] >; scrollbarbutton[type="increment"] {
	position: absolute;
	margin: -9px 0px 0px 0px;
	min-width: 15px;
	min-height: 24px;
	background: url("chrome://global/skin/scrollbar/down_cut.png") no-repeat bottom left;
}

What this is saying is roughly:

  • Rule #1: All scrollbarbuttons you might find in the universe that increment the property to which they are attached—regardless of context—should be painted with the icon right_cut.png using the following attributes.
  • Rule #2: Oh…wait…!! There’s one context where that’s not the case. Override it if it’s a scrollbarbutton that increments and happens to be enclosed in the context of a vertical scrollbar, and use down_cut.png with these other attributes.

This is—in my view—an abuse of the inheritance mechanism. If you don’t remember to override all the characteristics you added to scrollbarbutton that gave it universal “horizontalness”, those attributes will get inherited by the vertical case, causing unexpected defaults to appear as you modify lines of code. It’s much better to have two narrow rules that apply explicitly to horizontal and vertical scrollbar contexts. (This is especially true because there are other places in Firefox that scrollbarbutton might appear and need to be handled differently.)

After fixing general problems of that type, I was able to more clearly determine what each line did. Eventually I discovered the position: absolute characteristic was not affecting position, but was indicating that those parts should be drawn in a higher Z-order than the relative pieces. The usages of negative margin values was to make sure they overlapped a little bit, to allow for what I call the “nestling” of the thumb into the buttons.

I had to be a little tricky with the graphics to implement the curved gutter at the edge of the scroll bar region. This involved putting a bit of empty space at the end of the thumb so that it would hit the edge prematurely, leaving the gutter showing. I mixed in some bitmaps from screen captures in a way that didn’t look too horrible—but my goal was only to do approximately as good a job as MacFox had done with the buttons “Top and Bottom”. If a better pixel artist wants to take this on and improve it, that would be great.

So there is the story. Beyond just addressing the problem I started out with, I think the source is reworked enough to be a nice place to start for anyone who is going to try theming scroll bars on their own. To assist with that, you can download the source to the scrollbar CSS files here:

  • aqua_scrollfix_together-1.0-fx.jar / global / scrollbars.css : view CSS
  • aqua_scrollfix_topbottom-1.0-fx.jar / global / scrollbars.css : view CSS

Feel free to ask questions if you have them!

Valid XHTML 1.0 Transitional

Virtualization and the Integrated Circuit: Looking ahead

Saturday, November 3rd, 2007

This article is a written summary of a talk I gave at BarCamp LA #4, with some bugfixes. My presentation used a bunch of graphics swiped from google images since I was in a hurry, and I’d like to extend apologies/credit to my sources, all linked here: [highway barrier] [rc circuit] [resistor] [capacitor] [microcontroller] [grandparents] [moore’s law] [hanz/franz] [beos cortex]

Slideshow iconView the talk slides

Virtualization is a big deal these days; it’s in the news and there’s a lot of activity in the stock market surrounding the phenomenon. I want to briefly talk about ways I foresee virtualization being applied that are a bit more radical than how it’s generally being thought of today. In particular, I make the following claim:

Software applications are going to be increasingly built up from dozens of virtual machines per program.

To give some supporting evidence, I’ll relate an analogy with how problem-solving in electronics evolved over the past few decades.

A simple electronics problem

By degree, I’m an electrical engineer. And the kind of work that we used to be paid to do was that you’d have a mission like: “Make this highway construction barrier light blink every 3 seconds”:

A highway construction barrier with light.

How to solve it the E.E. way

To do this, you’re going to need a power source and a light. That’s a given. But electrical engineers can recognize the highway-barrier task as one of those problems that can be done with just 2 additional parts—a resistor and a capacitor. In fact, it’s a textbook case. So I’ll show you what that kind of problem looks like:

Textbook example of an RC circuit.

(taken from http://www.physics.byu.edu/faculty/berrondo/su442/ac.pdf)

Picking a resistor R and a capacitor C can give you the desired effect of a certain brightness and a certain amount of time between blinks. But it’s not like there’s a specific value of C which corresponds to “3 seconds”. And there’s not a value of R which matches “12,000V”. If you picked two components that satisfied your specification, you’d have to work the equations out again and replace both pieces if anything changed.

How to solve it the C.S. way

If you’re familiar with computer science, and someone told you to make a light blink periodically, your mind would probably jump to something like this:

const VOLTAGE_MAX = 12000;
const BLINK_PERIOD_SECONDS = 3;
const FLASH_TIME_MILLISECONDS = 500;
const UPDATE_INTERVAL = 100;
var lamp = HighwayBarrier.getLamp(this);
 
forever () {
   lamp.setVoltage(0);
   sleep_msec(
         BLINK_PERIOD_SECONDS * 1000 -
         FLASH_TIME_MILLISECONDS);
   lamp.setVoltage(VOLTAGE_MAX);
   sleep_msec(FLASH_TIME_MILLISECONDS);
}

I’m hand-waving here to say that these are functionally equivalent. The reality is a little too tied up with light & power to make this a “good” example. But it’s a simple and visceral way to see how digital signal processing (CS way) differs from analog circuit design (EE way).

Cost comparisons of the approaches

For the CS approach you’d you’d need some kind of compiler or interpreter. And you’d need a CPU to run that program. Typing “microcontroller” on Google Images I found this fellow as the first hit:

A twenty dollar microcontroller

Then I looked for a price on it and got $20.61. That’s a heavyweight unit for this humble task, and crazy expensive—but I’m going to run with it. By contrast, the EE way uses two parts that probably cost a few pennies. Let’s estimate 10 cents for the capacitor and 10 cents for the resistor. So off the cuff we might guess that the EE way is approximately 100 times cheaper per Highway Light manufactured.

Furthermore, integrated circuits that can run arbitrary programs are not built out of 2 parts, but rather millions of individual electronic components. So we see how grotesquely wasteful the CS way is. But let’s look at the big picture.

Looking at the bigger picture

It’s pretty obvious how the CS code works—and I didn’t even comment it. If I asked your grandparents to change it so that it would go to 13,000 volts instead of 12,000 volts… or to blink over a 10 second period instead of 3 seconds, they’d probably be able to point to the right lines. But who thinks their grandparents could make the necessary changes to the circuit?

Plus there’s flexibility with the microcontroller—if you decide you want the light to ramp up and ramp down in brightness instead of blip on briefly, it’s easy code to write. If you want there to be a special case where during daylight hours the lights turn off to save power, you can do that with an “if” statement. Though computer science is a deep and complex field in its own right… for things like this it takes a relatively trivial amount of education to do than analog circuit design. And modifying the program after the fact, once written by a competent programmer, can be extremely easy.

So even assuming a factor of 100 manufacturing cost difference—with an overkill microcontroller that can dance and sing (should your project ever get new requirements)—the $20 could pale in comparison to the labor charge difference.

Virtual machines are like Integrated Circuits

In the years during my EE degree and afterward, I watched analog circuit engineering become a niche discipline. In the meantime, the norm for hardware solutions is to use millions of electronic components to do a job that two or three well placed ones could do. This makes perfect sense—we know that a human time and ability to manage complexity doesn’t scale, but Moore’s Law has so far:

Moore's Law Graph

So now let’s start thinking of microprocessors as being like virtual machines. Then the parallel to resistors, capacitors, and inductors—the building blocks of analog circuits—would be lines of code. Just how a chip with a layout of components rivaling a large city was once seen as an overkill way of making a highway light blink… there are a lot of problems today that people wouldn’t even think of using a Virtual Machine for. It would be “crazy” and “wasteful”.

But I’m going to tell you my bet: software applications are going to be increasingly built up out of dozens of virtual machines per program. This will be due to a similar economy of scale to what happened with integrated circuits. So here are some radical ideas about those trends.

1. Porting and cross-platform libraries will die out

My hatred for porting comes partially from a deep-seated psychological issue of playing games like Pac-Man in the arcade as a kid, and then you’d buy the version for the Atari and it would look nothing like Pac-Man. So porting and I started out on the wrong foot, and my experiences with porting code have only reaffirmed my belief that you shouldn’t do it. Here’s a picture that makes me sad, a few of the first 60,000 google image hits for “Tetris Screenshot”:

Tetris Ports Screenshot

I really believe application developers are going to pick a platform, make their program really work on that particular platform, and ship it out to users with the expectation that they are going to run it in a virtual machine. When you run the application the details of the underlying OS will be hidden from you; it will already be configured.

I expect we’ll see virtual machine versions of programs catch on and replace the idea of installing the native versions within the next couple of years. An implication here is that we’ll be saying goodbye to things like the Windows and MacOS versions of the GIMP. People will learn and accept the idea that different applications have different “skins”, so if the program doesn’t look and act like their host OS they won’t be terribly concerned.

2. Operating systems that just do interfaces

We’re already seeing that some operating systems are succeeding based on their ability to have a nice user interface presentation. People are doing serious work on MacOS — using it basically as a glorified terminal. I can imagine that applications of the future will have a virtual machine in them which is for an OS specifically suited to the user interface needs of the application, while that OS may not actually run any of the program logic.

In other words, rather than using a GUI library and interfacing with a client, you’ll have a full blown OS whose sole purpose is to look snazzy. I think it will be a great place to see OSes that haven’t been gaining traction otherwise, perhaps a resurgence of things like BeOS and OS/2.

3. Libraries running in their own virtual machines

I’ve pushed the idea that many applications you install on your desktop will be running in virtual machines. That’s not outlandishly far off of what’s actually happening. But how about something crazier: would you ever use a string library that had its own daemons and filesystem? Perhaps even something esoteric, like QNX?

Where we once used libraries, or even a few lines of custom code…we will see a rise in the usage of whole entire operating systems. You might laugh at that, but as Hanz and Franz would say: “hear me now and believe me later”.

When performance and memory usage are less of a concern than security and the leveraging of human effort—you really might want to use heavily debugged and peer-reviewed systems that are maintained by a group that has standardized on a completely different platform from your project. It will be typical, if not expected, that your project will wire together virtual machines with all the casualness that people pipe together commands in the unix toolkit paradigm.

This is similar to how people are thinking with web services…taken to an extreme. There are already web apps that do their image manipulation, spell checking, or other bits of functionality by sending requests over TCP/IP to a web server and getting XML back. Web standards have been one force for untying people from using any particular operating system, and I think this will keep pushing on that. But you don’t need to invoke a network—and I think the most common case will be VM components that are run on the local machine as active libraries with access to the WAN disabled.

(Though if you wanted to, you’d sometimes log into it, and let it connect to the internet to update itself—all in its own sandbox.)

Further thoughts

These ideas push away from goals that we consider important today, like user interface consistency standards. We could have “Night of the Living Dead Operating Systems”, where a virtual machine running BeOS on the front end uses an OS/2 codebase to do its computational work, and packages it all up as if it were a “normal” application. This sounds really scary to some people—am I actually advocating this?

Well I’m not even saying that this is the way I think development should be done (besides the part about not wasting effort porting things!). One engineer I talked to sheepishly admitted he’d used a whole chip where basically a flip-flop would have sufficed… but it was easier to plug the part in, especially since he couldn’t predict what he might want that subsystem to do later. Similarly, with VMs, I think this convenience of using hardened elaborate components is going to pop up in all kinds of places we’d use two or three lines of carefully chosen code today.

Valid XHTML 1.0 Transitional


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported