BLACKHIGHLIGHTER: Protected Public Discussions

The Blackhighlighter Logo

NOTE: Blackhighlighter is in the midst of a port from Apache+Django+MySQL to be run on dedicated servers to Node.js+Express+Swig+MongoDB to be run on Cloud Foundry. Until that is complete and documented, this page will be out of sync with the GitHub repo! I’ll try and keep the links working, though.

BlackHighlighter combines modern cryptography and client-side editing to create a new way of communicating on the Internet. You publish text on a network server with the ability to “protect” certain parts of what you have written. These protected parts can only be read by those who you give access to (and, of course, whoever they share it with!)

This may bring some “cloak and dagger” and legalistic scenarios to mind. Yet despite the obscuring nature, this was conceived as a tool to help achieve greater transparency! The hope is to coax completely private conversations into the light so they are mostly public. There’s also an element of accountability in the mix: using a commitment scheme, any missing portions that surface can be checked to ensure it was the text that was originally blacked out at the time of publication.

BlackHighlighter takes the guesswork out of forwarding. The sender points out any sensitive information at the time of writing, so anyone receiving certificates with the missing bits understands those should be treated as confidential. Yet everything else was published on a server for anyone to read, so there’s a clear delineation of what’s okay to share. This stops misunderstandings, and removes unnecessary blocks on the flow of information to those who may need it.

The easiest way to understand it is probably to see it in action, so I built a working prototype. The underlying system supports multiple separate “colors” of redaction pens that can be revealed independently, but the UI doesn’t let you pick a color at the moment. Here is a screencast (circa May 2009):

I’ve published the source to my demo in order to get community feedback. Yet just as Ward Cunningham outlined “The Wiki Way” as more of a collaborative mindset than a specific technology, I’m most interested in seeing the “spirit” of the idea spread. The point is to be transparent to the greatest extent possible, while transforming any “unknown unknowns” into “known unknowns” for which you take responsibility.

(UPDATE: I’ve put up a tentative demo server running the django app at blackhighlighter.org. It’s a bare-bones site which hosts the app and has no other site features, so don’t expect too much. But it shows the workflow of using BlackHighlighter.)

The Code

The server is written in Python and is built on the django framework. The client code is JavaScript and HTML, and depends on jQuery UI. It’s my first time using these frameworks, as well as my first Python program (and Albumist is my only other JavaScript effort.) So peer review is very welcome!

It’s released under the GNU Affero General Public License, Version 3 and you can browse it on GitHub:

http://github.com/hostilefork/blackhighlighter/tree/master

The biggest workhorses here are the Javascript files accompanying the tabbed page for writing letters (write.js) and for reading+verifying+revealing letters (read.js). What you’d think of as the HTML is in the /templates/ directory. For those familiar with django, the Python code should match your intuition (with models.py and views.py).

An important aspect in the separation of the code is that the protected parts of your text are never sent over the network in an unencrypted form. This way you don’t have to trust the person running the server not to read or reveal the protected parts—they don’t have them! Yet the system is in its infancy, so if you’re a cryptography expert then please chime in with any ideas. (Here’s what I got so far from sci.crypt.)

There is also protection in the protocol to prevent the server from lying about the contents of the message or the commitment hashes. The URL itself contains a hash of the canonized commitment JSON. This way, even if you don’t have a certificate you can verify that what’s on the server is what it should be. You only need the URL itself to verify that the contents are correct! This is a step up from using a random ID for each blackhighlighter text.

I’m going to have to put together an INSTALL file to help anyone unfamiliar with django get started running this on their own server, and maybe zip up the required libraries so you don’t have to hunt them all down yourself. In the meantime, the video above is probably the easiest way to experience and share the idea. (I’m not thrilled about setting up and administrating an Apache server to run this for the general public myself, but will probably do so anyway.)

History and Motivation

I started the project after hearing Kimo Crossman describe a problem with getting information from the government. Their conventional use of Microsoft Word or Excel freely mixes in sensitive information with matters of public record. Lawyers have to “scrub” the confidential data out of the files (which may include “meta-data” that is hidden in parts of the document one does not typically see!)

Because there is cost associated with separating the private data from the parts that should be public, it’s an uphill battle in the courts to get the documents. Kimo suggested remedying this with tools for declaring the sensitive information at the time it’s entered. Ideally this would allow everything else to be published instantly to the web.

My variation on his idea was to create something that people outside the government could use, thus building awareness that working in this style is even possible! So I envisioned a service for communicating with U.S. Senators and Representatives. By encouraging people to formulate their correspondence so most of it could be shared on the Internet, it would effectively make the inboxes of elected leaders searchable by anyone. Yet it wouldn’t have the usual problem of anonymous Internet posts, because the protected portions would be sent to the official.

I intended to submit my site to the 2009 Apps for America Competition, but only had two weeks and couldn’t make the deadline. So I took that as a mixed blessing, and decided to modularize the code so that it might be used in blog commenting systems or other parts of a site. I’m still looking to enter this into a web innovation contest or grant program of some sort, so if you have any leads then let me know! :)

4 Responses to “BLACKHIGHLIGHTER: Protected Public Discussions”

  1. Troy Gardner Says:

    Very cool, here’s where I’d like the spirit go.

    - a whiteboard version (LolBlackCats)
    -a realtime collaborative version. BlackSkype
    - a wiki version

    Is there real black highlighters with switchable redaction, I’d like to get a
    case for the NSA, FBI, and those dudes who patrol Area 51.

  2. Lloyd Budd Says:

    Awesome work!

    You lost me a little at “I think this is a good alternative to approaches like Gmail, where the organization running the servers have a sneaky advantage in being able to read & analyze your private data”. Not sure I understand how Gmail relates. Are you just referring to gmail generally?

    Isn’t it the “protected parts of your text are never sent over the network” *unencrypted*. Unencrypted being an important part ;-)

  3. Hostile Fork Says:

    Hi Lloyd, glad you like it! Thanks for the feedback on the article, I’m trying to hone it a bit here in the early stages. I also want to put together a screencast with an actual script and am looking for some of the most compelling use cases. If you know anyone who would be interested or maybe have a thought to pitch in, please forward it on!

    You’re right that my point about Gmail seemed like a non-sequitur, so I reworded that part. It’s just that lately I’ve thought about how people accept how Google indexes and analyzes your private email and keeps the generated data to themselves. Here I’m supporting how much value is latent in those communications—but it shouldn’t be one company who you give that value to and you should be composing using a client that lets you protect sensitive or identifying parts, rather than trusting third parties to do the anonymization in a way you agree with.

    Blackhighlighter doesn’t ever precisely send “the message” encrypted over the network (as you might say you had done if you sent a password-locked ZIP file). SHA256 is “just a hash”, and due to the Pigeonhole principle there are an infinite number of messages that exist with that hash. For clarity I think it’s safe to say what you suggest, that “protected parts of your text are never sent over the network unencrypted”, and ignore the subtle question of whether they are actually “sent encrypted” or not.

  4. aremvee Says:

    i wonder if it could be used to redact content from emails already sent?. How i wish i had the facility to Kill a sent mail before the recipient possibly actually opened it.

    At least if something i wasn’t sure about was Black Highlighted, i could pull it back, or it flags to the reader i’m not really sure.

    No doubt all govt documents would then be sent fully blacked all the time diminishing the benefit thereof.

Leave a Reply


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported