USCII: Character Codes With Meaning

The USCII Standard for Information Interchange

UPDATE: Try out the new online Encoder and Decoder for USCII-5×7-ENGLISH-C0!


USCII (”you-ski”) stands for Universal Semiotic Coding for Information Interchange. It is a system for embedding pictures inside the coded numbers agreed upon to represent symbols and signals. I was inspired to create it by the famous Arecibo Message, which attempted to convey humanity’s physics knowledge without assuming a cultural context other than math.

For instance, instead of ASCII’s encoding of 65 for “A” and 66 for “B”…we might consider using the number 15621226033 for “A” and 16400753439 for “B”. To see the bitmaps, you must first convert these values into binary:

  • 15621226033 (base 10) =
    01110100011000110001111111000110001 (base 2)
  • 16400753439 (base 10) =
    11110100011000111110100011000111110 (base 2)

When transmitted in a medium which hints at the significance of a 35-bit pattern, the semiprime nature of 35 suggests decomposing it into the factors 7 and 5. Interpreting these bits as a 5×7 rectangle produces a picture—which happens to be a picture of the letter which the code is representing! For 01110100011000110001111111000110001, you would get:


0 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 1 1 1 1
1 0 0 0 1
1 0 0 0 1

Do you see the “A” drawn by the ones? Now let’s try putting a break at every fifth bit in 11110100011000111110100011000111110:


1 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 1 1 1 0

Since USCII is a general methodology, there can be many standards—the only requirement is that the bit length be a product of primes. You could have a 31×31 USCII which establishes codes for all the characters in Chinese. Or a 7×11 USCII that has all the characters you need for Dutch…as well as a standard set of Emoticons!

USCII codes have additional interesting properties, some of which I laid out in a blog post from last year.

Arecibo ASCII

I’ve developed a *draft* specification of USCII variation “5×7-ENGLISH-C0″. This uses 35 bits per character, and includes printable characters as well as the “C0 control codes”. You can read the script that generates it, which contains comments on why I picked the bit patterns:

specifications/uscii-5×7-english-c0.rebol

I’ve informally named this variant “Arecibo Ascii”. That’s because it is possible to losslessly convert a stream of conventional ASCII characters into USCII-5×7-ENGLISH-C0 (and back again). It’s still a work in progress, but here’s the table as it currently stands:

ASCII Character Arecibo ASCII (35-bit binary)
0 Null character 10101010101010101010101010101010101
1 Start of Header 10101101111010110111101011011110101
2 Start of Text 11011111111101111111110111111111011
3 End of Text 11011110111101111011110111111111011
4 End of Transmission 11111111111111111111111111001110011
5 Enquiry 11111111010000011101101110000010111
6 Acknowledgment 11111101011111111111011101000111111
7 Bell 11011100011000110001000001111111011
8 Backspace 11111110111011100000101111101111111
9 Horizontal Tab 00000000000000111101000010000000000
10 Line Feed 11100001000010000100111110111000100
11 Vertical Tab 00100001000010000100001000000001110
12 Form feed 11111011100010000000111110111000100
13 Carriage return 00001000010010101101111110110000100
14 Shift Out 00100101111101111011110111110111100
15 Shift In 11100111011101111011110111011100100
16 Data Link Escape 11111111110010001110001001111111111
17 Device Control 1 11111101111001110001100111011111111
18 Device Control 2 11011110110101001010011100111010001
19 Device Control 3 11111101011010110101101011010111111
20 Device Control 4 11111100011000110001100011000111111
21 Negative Acknowledgement 11111101011111111111100010111011111
22 Synchronous Idle 11111111111111111111111110101011111
23 End of Trans. Block 11111000000111001010011100000011111
24 Cancel 10001000000101000100010100000010001
25 End of Medium 11111100010110001010001101000111111
26 Substitute 10001011101111011101110111111111011
27 Escape 00011001100101011110011100111010001
28 File Separator 10101101011010110101101011010110101
29 Group Separator 11011110111101111011110111101111011
30 Record Separator 11110111101101010010000001001111011
31 Unit Separator 11111111111111111111100111101110111
32 Space 00000000000000000000000000000000000
33 ! 00100001000010000100000000000000100
34 01010010100101000000000000000000000
35 # 01010010101111101010111110101001010
36 $ 00100011111010001110001011111000100
37 % 11000110010001000100010001001100011
38 & 01100100101010001000101011001001101
39 01100001000100000000000000000000000
40 ( 00010001000100001000010000010000010
41 ) 01000001000001000010000100010001000
42 * 00000001001010101110101010010000000
43 + 00000001000010011111001000010000000
44 , 00000000000000000000011000010001000
45 - 00000000000000011111000000000000000
46 . 00000000000000000000000000110001100
47 / 00000000010001000100010001000000000
48 0 01110100011001110101110011000101110
49 1 00100011000010000100001000010001110
50 2 01110100010000100010001000100011111
51 3 11111000100010000010000011000101110
52 4 00010001100101010010111110001000010
53 5 11111100001111000001000011000101110
54 6 00110010001000011110100011000101110
55 7 11111000010001000100010000100001000
56 8 01110100011000101110100011000101110
57 9 01110100011000101111000010001001100
58 : 00000011000110000000011000110000000
59 ; 00000011000110000000011000010001000
60 < 00010001000100010000010000010000010
61 = 00000000001111100000111110000000000
62 > 01000001000001000001000100010001000
63 ? 01110100010000100010001000000000100
64 @ 01110100010000101101101011010101110
65 A 01110100011000110001111111000110001
66 B 11110100011000111110100011000111110
67 C 01110100011000010000100001000101110
68 D 11100100101000110001100011001011100
69 E 11111100001000011110100001000011111
70 F 11111100001000011110100001000010000
71 G 01110100011000010111100011000101111
72 H 10001100011000111111100011000110001
73 I 01110001000010000100001000010001110
74 J 00111000100001000010000101001001100
75 K 10001100101010011000101001001010001
76 L 10000100001000010000100001000011111
77 M 10001110111010110101100011000110001
78 N 10001100011100110101100111000110001
79 O 01110100011000110001100011000101110
80 P 11110100011000111110100001000010000
81 Q 01110100011000110001101011001001101
82 R 11110100011000111110101001001010001
83 S 01111100001000001110000010000111110
84 T 11111001000010000100001000010000100
85 U 10001100011000110001100011000101110
86 V 10001100011000110001100010101000100
87 W 10001100011000110001101011010101010
88 X 10001100010101000100010101000110001
89 Y 10001100011000101010001000010000100
90 Z 11111000010001000100010001000011111
91 [ 01110010000100001000010000100001110
92 \ 00000100000100000100000100000100000
93 ] 01110000100001000010000100001001110
94 ^ 00100010101000100000000000000000000
95 _ 00000000000000000000000000000011111
96 ` 01000001000001000000000000000000000
97 a 00000000000111000001011111000101111
98 b 10000100001000011110100011000111110
99 c 00000000000111110000100001000001111
100 d 00001000010000101111100011000101111
101 e 00000000000111010001111111000001111
102 f 00010001010010001110001000010000100
103 g 00000000000111110001011110000111110
104 h 10000100001000011110100011000110001
105 i 00000001000000000100001000010000100
106 j 00010000000001000010000101001001100
107 k 01000010000100101010011000101001001
108 l 01100001000010000100001000010001110
109 m 00000000001101110101101011010110001
110 n 00000000001011011001100011000110001
111 o 00000000000111010001100011000101110
112 p 00000000001111010001111101000010000
113 q 00000000000111110001011110000100001
114 r 00000000001011011001100001000010000
115 s 00000000000111110000011100000111110
116 t 00100001001111100100001000010100010
117 u 00000000001000110001100011000101110
118 v 00000000001000110001100010101000100
119 w 00000000001000110001101011010101010
120 x 00000000001000101010001000101010001
121 y 00000000001000101010001000010001000
122 z 00000000001111100010001000100011111
123 { 00011001000010001000001000010000011
124 | 00100001000010000000001000010000100
125 } 11000001000010000010001000010011000
126 ~ 00001011101000000000000000000000000
127 Delete 11111110001010100010101011100011111

The C0 codes are admittedly rather tricky. Especially to depict things like “Data Link Escape” or “Device Control 1″! It would be possible to use a larger bit size and get clearer images. But I’d like to see how far the 35 bit standard can go in cueing people who aren’t familiar with ASCII into what the bitmaps signify…

2 Responses to “USCII: Character Codes With Meaning”

  1. RSMilward Says:

    Neat idea! In your examples for A and B, may I suggest you insert a space between the digits (for better physical proportions), and reverse the red and blue colors (I kept trying to “read” the blue characters, which are the background). And I think the crossbar for the A should be one row higher.
    [I did something like this years ago to punch human-readable messages into paper tape (with an ASR-33 Teletype driven by a Data General Nova minicomputer, using BASIC), but used a 6-dot-high character, leaving an unpunched row along the edges of the tape to make it less easily tearable. Yes, I’m old :-) ]

  2. Hostile Fork Says:

    @RSMilward: You are right, the red/blue was hard to read. I switched to black and gray and it’s much better.

    As for changing the symbol for A… well, one issue is that I have sent a few Arecibo Ascii messages out into the ether by now. And if I change the patterns then that means they won’t decode. Which wouldn’t be the end of the world with this standard, especially since people can go down to the bits and judge for themselves what the letter is!

    But the font was not my design, it came from the PIC CPU:

    http://www.noritake-itron.com/Softview/fontspic.htm

    I’m assuming they had a reason for putting the crossbar where they did. In context it may help differentiate the A… I don’t know (?) It’s certainly not the most questionable choice in the standard so far.

    What I did do is I made an updated check in so that all the printable characters are now easy to edit in the source:

    http://github.com/hostilefork/uscii/blob/master/specifications/uscii-5×7-english-c0.rebol

    So if the idea entertains you enough to want to get involved in helping nail down the symbols for a “formal” release…I’ll check the pull requests. :)

    Thanks for the interest, in any case!

Leave a Reply


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported