Sound of a numbers station

Numbers are a great medium for communication. In fact, everything you ever stored or sent through a computer – every document, every song, every pirated movie – exists as a number. Culture obviously shapes how we choose and assign and interpret those numbers. But those numbers also in turn shape culture.

The World on a Wire Show.

Opening theme music

Season One, Episode One.

Beautiful numbers, magic numbers, & illegal numbers

If you haven’t formally studied computer science or computer engineering, someone may have told you, perhaps patronizingly, that computers “think” in binary, or in ones and zeroes. Here’s what that actually means. Imagine electrical pulses funneling telegraph messages from a teletypewriter in San Francisco to a teleprinter in New York. A vast board of cryptically numbered two-way Bakelite switches. Microscopic hills and valleys laser-etched into the reflective surface of an optical disk. Imperceptibly fast flashes of light channeled through fiber-optic cable. What do these scenarios have in common? Digital communication and digital information storage is all about reducing everything – from the individual letters and punctuation marks of a telegram to the ever-shifting frequencies of the human voice and the precise color of each of two million pixels that have to be updated thirty times per second to display a YouTube video – to a series of binary states, states of on-and-off, of high-and-low-voltage, or of ones and zeroes. Transmitting information in this symbolic form allows contemporary technology to use what were once unreliable, unforgivably noisy signals – infrared, weak short-range radio broadcasts, transcontinental cable networks, even radio communication with artificial satellites.

Sound of a Morse code transmission

It turns out to be a lot easier to understand the dots and dashes of Morse code in the scratchy signal from a distant radio broadcast than it is to make out the words spoken by even the most eloquent human operator on that same signal. By rendering all information to something like ones and zeroes, contemporary machines propagate incomprehensible amounts of information across global networks made of these weak and treacherous signals, virtually without copying errors.

Now, ones and zeroes are obviously numbers. And if you string enough of them together, you can express in binary any number that you can express in the decimal number system. 101010 is the standard binary representation of the number forty-two, for example. So any string of ones and zeroes is a number, and any number can be a string of ones and zeroes. But in computer science people often prefer not to write numbers in the decimal number system we use for everything else. Here’s why. Just as any decimal number is written with a certain number of digits – so one-hundred forty-three is written as 143, with three decimals – a binary number is expressed with a certain number of binary digits or bits. And it turns out that the number of bits needed to write a number doesn’t quite line up with the number of decimal digits needed to write that same number; the number nine is written with one decimal digit and takes four bits to express in binary, but the number ten, which is of course written with two decimal digits, still takes only four bits to express in binary. However, if you only use the digits 0 through 7, you can cover every possible string of three bits. And just like we can express numbers using just ones and zeros, we can write numbers using only the eight digits 0 through 7; this is called an octal number representation. If you have a very long binary number, and you take just three bits at a time, starting from the smallest or least significant part of the number, you can replace each string of three bits with a single octal digit and the result is that same number expressed in octal. But these days computer people are more likely to take four bits at a time, and express each group of four bits as a single hexadecimal digit, which can be one of the digits 0 through 9 or one of the Latin alphabet characters A through F, where 0 has a value of zero, A has a value of ten, and F has a value of fifteen. Writing numbers this way is annoying if you’re not used to it, but makes it easier to read computer data in the long run, because you can translate each digit back to four bits if you have to, and it’s much more compact than expressing the same data in binary.

But to express information like text and images as binary strings, we have to decide what number represents the letter A, or a very specific shade of blue, or 天 (tiān), the Han Chinese character that signifies the sky or heavens. This process of converting information to numbers is called encoding. So, for example, the color encoding system we use for most purposes maps specific colors to the numbers zero through sixteen million, seven-hundred seventy-seven thousand, two hundred fifteen (which is written as FFFFFF in hexadecimal). Digital display hardware uses these numbers to determine how brightly to illuminate the individual red, green, and blue components that make up each pixel on a screen. This encoding system turns out to be pretty adequate at covering the range of colors that a typical human eye can detect, though of course some displays are better at producing these colors than others.

This illustrates a recurring theme in the numerical encoding of sensory stimuli; whereas in the natural world we have the whole spectrum of real numbers to quantify the wavelength of a color, or the frequency and amplitude of a sound, or the precise location of a point in three-dimensional space, in computers we often have to settle for good-enough discrete approximations of those real numbers. If the approximation is good enough, if its precision approaches the differential threshold or “just-noticeable difference” for that sense, then the only real limits on its fidelity to the analog source are given by the capabilities of hardware (such as headphones, a computer monitor, or a 3D printer) and the given bandwidth or storage space, which might necessitate omitting or otherwise fudging some of the numbers so everything can be compressed to a manageable size. Any sounds that can be recorded on vinyl can be reproduced on an audio CD with enough precision that a typical human ear could not detect any differences, but a listener may begin to notice differences when that sound is compressed for internet streaming or to produce an mp3 file that won’t take up too much room on a smartphone or laptop hard drive.

But not all encoding schemes are this comprehensive. For years, the most pervasive standard for encoding text on computers was ASCII, the American Standard Code for Information Interchange, a scheme developed for those teletype machines I mentioned earlier – machines that transmitted messages across telegraph wires long before computers came on the scene. The problem with ASCII was that it was designed to efficiently communicate English-language messages from a teletypewriter at one end of the wire to a teleprinter at the other end, and nothing else. These machines had an interface that was pretty similar to the mechanical typewriters that some people had in their homes and offices – except that the typist and the machinery that printed the message on paper could be separated by a considerable distance. So ASCII was designed to use only seven bits of information for each code, allowing one hundred twenty-eight codes in all. Thirty-three of those codes were non-printable codes used for various control signals that the teletype machines could send to each other. The rest of the codes were used for the twenty-six unaccented characters of the Latin alphabet used to write English, in upper-case and lower-case forms, numerals 0 through 9, and various punctuation marks, including the at sign (@), then used primarily in accounting, and a grave accent all on its own, which a teletypist could use by shifting the teleprinter back to the previous letter and printing the accent mark over it. I mention these punctuation marks in particular because the fact that they were present in ASCII meant that as computer researchers like Grace Hopper started typing programs into computers as a mix of English-language keywords and symbols, the punctuation marks were given special new meanings for a computing context. That grave accent became known as a backtick and could be used, for example, to embed the output of a command in another command in a Bash terminal. The at sign should be familiar to anyone who has used email or Twitter.

But because of its focus on American English, ASCII couldn’t express the characters used to write the native languages of, well, most people on Earth. It didn’t have the the Eszett (ß) used in German or the a-ring (å) used in Swedish. And it was completely inadequate for writing Vietnamese, or Russian, or Sanskrit, or Korean, or Chinese. So while American computers typically used ASCII, western European computers used variations on that standard that provided the accented characters and alternative currency symbols they needed. This worked fine for local business communications, but when it came to programming or to simply communicating with computer users from another country, it could cause a lot of problems. What an American computer understood as an opening curly brace ({) might appear on a Swedish computer screen as a lower-case letter a with an umlaut (ä). And it only gets more complicated. Computer scientists in the Soviet Union had to settle on a character encoding standard that would support the Cyrillic alphabet, and computer scientists in China needed a standard that could accommodate literally thousands of characters. The result of all this was that computers the world over used a myriad of co-existing partial standards for converting text to numbers and back, and there were times when, if you were browsing some foreign corner of the World Wide Web, you might have to tell Netscape Navigator or Internet Explorer to try a different character encoding so the page you were looking at would stop being an indecipherable mess of accented letters and punctuation marks that didn’t belong next to each other and start looking more or less like it did on the computer that was used to write it.

In the early 1990s, an industry group called the Unicode Consortium published the first versions of a new character encoding scheme designed to replace ASCII and all the other standards. The Unicode Standard for character encoding is designed to cover as many commonly-used human writing systems as possible, including both simplified and traditional Chinese, the Cherokee syllabary, and even Linear A, an ancient Minoan language that has never been conclusively translated by modern scholars. Thanks to some copying errors in the creation of a Japanese character encoding standard that was later incorporated into Unicode, and further such errors made when bringing the Chinese, Japanese, and Korean standards into one system, Unicode even includes a few so-called “ghost characters” that have never been used in actual writing. The fonts on your computer might not support all of the writing systems and characters covered by Unicode, and might show you little boxes or other such generic symbols in place of the characters they’re supposed to show, but Unicode ensures that our computers agree on what letter or symbol any number in a text file represents.

However, the advent of Unicode has not eliminated the fundamental anglocentrism of the way programmers interface with computers. Not only do all of the world’s most widely-used programming languages still rely on English-language keywords like if, while, and class, but some of the tools programmers use still expect that filenames or the files themselves will contain only ASCII characters.

All of this is to say that the way we represent communication, information, and the creative output of human beings as numbers has consequences for our lexicon – after all, who was using the at sign for anything other than accounting before the advent of computer networks? – and also effectively establishes linguistic or cultural hegemony in the field of computer science.

In 2012, computer scientist Ramsey Nasser created the programming language قلب, which has a similar syntax to a popular programming language called Scheme, but uses only Arabic-language keywords and doesn’t involve the Latin alphabet at all. Here’s what he had to say about it.

قلب is built entirely on Arabic, and everything broke. Every text editor just has no idea what to do. The terminal is useless. All of the tools that I use to be creative while writing code fall apart, and I thought it would be interesting to challenge that.

Ramsey Nasser created قلب as part of an art project; by writing programs in Arabic he was able to build on traditions of Arabic calligraphy and tile mosaics to create programs with visually beautiful source code; I encourage you to visit his web site at nas.sr – for those of you who prefer the NATO alphabet that’s November Alpha Sierra Dot Sierra Romeo. And I have to say that right now, in 2019, he’s still right about the fundamental deficiencies of writing system support in developer tools. When I was scripting this episode, I pasted the word قلب into the text editor I was using, a program called Vim, and the individual characters of the Arabic word were displayed in reverse order. Arabic, of course, is written right-to-left, but the document I was writing was primarily in English and therefore left-to-right, and without any special configuration Vim didn’t know how to handle that. (In fairness, I should note that, given the right commands, Vim does support viewing and editing text files written entirely in Arabic, but the implementation and use of this feature is made difficult by the assumptions baked into the terminal environment in which I use Vim.)

But let’s get back to those numbers. Unicode gives us all we need to transform every single character of the text of, say, the complete works of William Shakespeare into a number – and stringing all those numbers together, we get one very long number that, when read properly by a computer expresses the entirety of that text. If we save this long number as a file, the computer needs some way to detect, later on, that the file is Unicode text and not not, say, a JPEG-format image. One way this can be accomplished is with magic numbers. A JPEG file begins with the number sixty five thousand, four-hundred ninety-six – that’s FFD8 in hexadecimal – and ends with the number FFD9. Since all files are just numbers, these magic numbers give the computer a clue that the numerical information in a particular file provide color information about the pixels of image, stored and compressed in a particular format, and not, say, text, or the electromagnetic pulses required to reproduce a Wendy Carlos recording.

And because these files exist as numbers, we can do mathematical operations on them. At first, this might not sound very useful; why would I want to find an exponential power of The Wizard of Oz? But because some mathematical operations, like exponents, are more difficult to do in reverse, we can use math to obscure the contents of a file to anyone who doesn’t have some piece of information, some password or a large secret number called a key. This is cryptography. Cryptography can also be used to verify that files and messages you send to someone over the internet haven’t been tampered with by a third party or become corrupted, to verify your identity electronically, and has recently even formed the basis of experimental currencies. Today, most web sites use some form of cryptography to reduce the risk of someone interfering with or snooping on user activity. A related field of techniques called steganography allows us us to disguise secret files and messages so an interloper might not even notice we have anything to hide. By slightly altering the color codes of a PNG image file, we can scatter the bits of a secret text file throughout the image file in such a way that only the recipient knows how to recover them and piece them back together.

So we’ve established that any artwork stored digitally exists as a number. In computing, numbers don’t just represent art in the way that a library catalog number represents a book; they constitute art in the sense that the string of text that runs from the beginning of that book to the end constitute the book. Maybe there’s a number that can almost always make you smile in spite of yourself. Maybe there’s a number that has the power to reduce you to tears. And maybe some numbers are illegal.

Not all human expression is legal in all contexts. Some of it is illegal for pretty obvious reasons; its exposure may pose a direct threat to someone’s health and saftey. But some is illegal because it inconveniences a state or a corporation that holds some sway over a state. One of these less obvious cases gives us the most famous example of an illegal number.

When the home video market started moving from VHS videocassettes to DVDs in the late 1990s, corporate players in that market hoped to prevent the kind of home copying of videos that was possible with VCRs. A cat-and-mouse game commenced, where industry associations implemented various content-scrambling, and other so called “copy protection” schemes and users of various internet fora wrote software to circumvent these measures so people who had commercial DVDs could make copies of them – or simply watch them on Linux-based operating systems. One such piece of software, called DeCSS, prompted extensive legal action from the DVD Copy Control Association and the Motion Picture Association of America, on the grounds that DeCSS enabled people to infringe copyright law. A seventeen-year-old developer of DeCSS was brought to court in Norway, and anyone who hosted copies of this software, of software like it, or of software with a similar-sounding name, were bombarded with threats of litigation. This provoked a lot of pushback from the libertarian-leaning hacker subculture, from free and open source software enthusiasts, and from video collectors on the internet. People who were upset with the corporate entertainment industry’s actions started doing everything they could to frame what the industry called “copy protection” as an attack on free speech and free expression. They started expressing the algorithms used in DeCSS in novel and artistic ways – hidden in visual art or music, described in metered poetry – and essentially dared the lawyers to challenge their artworks in court.

Computer scientist Phil Carmody even found that if he compressed the source code for DeCSS in a certain way, the number that constituted the file data was a prime number, and that publishing the number was therefore imperative to academic mathematicians, but was also potentially illegal.

Ultimately, industry lawyers couldn’t fully suppress the proliferation of DeCSS and related software. But when the HD-DVD and Blu-Ray Disc formats supplanted DVDs, the concept of illegal numbers re-emerged in a less esoteric fashion. Commercial Blu-Ray discs are often encrypted with a secret number thirty-two hexidecimal digits long, so that only Blu-Ray players or software that have that secret number can correctly play the disc, allowing corporations to control the design of Blu-Ray players through licensing schemes. But through leaks or reverse-engineering, people began to obtain these secret numbers and distribute them online. When the social news web site Digg bowed to pressure to retract one of these secret numbers after it appeared in an article in 2007, this war between the interests of corporate intellectual property and proponents of unencumbered sharing of information reignited, with people once again propagating the forbidden information by all available means.

It was in the midst of this situation that blogger John Marcotte created what he called a “Free Speech Flag.” The flag at once symbolizes the anti-intellectual-property camp in this conflict and expresses the forbidden number itself. The flag is divided into five vertical stripes, The hexadecimal codes for the colors of these stripes, from left to right, are 09F911 029D74 E35BD8 4156C5 635688, and the text +C0 appears in the lower-right corner.

Closing theme music

This episode of The World on a Wire Show was written and narrated by me, Dominique Cyprès. The numbers station audio heard at the beginning of the episode was recorded by kwahmah_02 of freesound.org. The Morse code transmission I used was recorded by freesound.org contributor Trebblofang. The opening theme music was “Come Inside” by Zep Hurme featuring Snowflake. The closing theme is “Start Again” by Alex Beroza. Get the latest updates on The World on a Wire Show at patreon.com/lunasspecto; that’s Lima Uniform November Alpha Sierra Sierra Papa Echo Charlie Tango Oscar. Any numbers that appear in this episode are shared for educational purposes.

This work is licensed under a Creative Commons Attribution 4.0 International License.