The image society: How Nasir Ahmed shaped image-based life

BY STEVEN JOHNSON

Fifty years ago, the computer scientist Nasir Ahmed came up with a brilliant idea for compressing data, laying the groundwork for Zoom, YouTube, Instagram, and all the other cornerstones of today's image-based online life.

Illustration: Agata Nowicka

Nasir Ahmed

Our contemporary digital existence is so saturated with images—all those TikTok stars and YouTube instructional videos and iOS family photo collections—that it is difficult to recall that for many years the Internet (and the Web) looked like it was going to be all about the resurgence of text. For most of the post-WWII era, the increasingly influential medium of television had suggested that we were headed towards a society dominated by images, not words. But when computers—and particularly network-connected computers—first became part of mainstream culture, they were largely text-based machines. The original architecture of the World Wide Web, conceived by Tim Berners-Lee in the 1980s, couldn’t display still images at all; the idea of watching a video over the Internet would have seemed preposterous back then.

Nasir Ahmed, 2012, ©Jogfalls 1947, CC BY-SA 3.0, via Wikimedia Commons

One of the most important contributors to the image counter-revolution was an Indian-born computer scientist named Nasir Ahmed, who had an idea back in 1972 that the experts thought was “too simple” to warrant giving him a grant to develop it.

All that eventually changed, of course. For better and for worse, the “image society” has returned with a vengeance over the past decade or two. We don’t think twice about taking a photo of our sandwich and uploading it to the Web; we watch, as a species, roughly five billion YouTube videos every single day. That momentous change was the result of countless innovations in both software and hardware: new standards that allowed us to share photos and videos over the open Web; new high-bandwidth infrastructure; and startups like YouTube and Instagram that popularized sharing images. But one of the most important contributors to the image counter-revolution was an Indian-born computer scientist named Nasir Ahmed, who had an idea back in 1972 that the experts thought was “too simple” to warrant giving him a grant to develop it. Nearly fifty years later, that idea is lurking behind just about every image you see online.

Difficult beginnings. Illustration: Agata Nowicka

A picture worth a thousand words

Ahmed was born in 1940 in Bangalore, India, where he was raised by his maternal grandparents. “My grandfather had a tremendous impact on my life,” Ahmed recalls. “He was an electrical engineer, and in 1919 he was sent to Schenectady, NY to work for General Electric. He loved the people and the place so much that he always encouraged me to study electrical engineering and pursue my graduate education in the United States.” Inspired by his grandfather’s words, Ahmed moved to the United States in the early 1960s to do graduate work at the University of New Mexico. As a grad student in Albuquerque, he met an Argentinian emigre named Esther Pariente; the two fell in love and remain married today, 57 years later. In 1966, he briefly took a job at the conglomerate Honeywell which had just launched a computer division to compete with the then-dominant IBM. Even at this early stage in his career, Ahmed was a pioneer of sorts. Today we take it for granted that people of South Asian descent play important roles in the digital revolution. Just look at the org charts of Big Tech companies: currently, the CEOs of Alphabet (Google), Microsoft, and Twitter are all Indian Americans. But when Ahmed first began exploring the world of cutting-edge algorithms, most technology firms in the nascent hubs of Boston and the Bay Area were overwhelmingly—if not exclusively—made up of white Americans.

“My grandfather had a tremendous impact on my life. [...] He always encouraged me to study electrical engineering and pursue my graduate education in the United States.”

Indian Americans in the San Francisco Bay Area

The recent growth of Indian-American immigrants in the US tech sector has been a dramatic demographic transformation, though it is not often remarked upon. All three of the CEOs—Twitter’s Parag Agrawal, Microsoft’s Satya Nadella, and Alphabet’s Sundar Pichai – emigrated to the United States after being born in India. (In addition to those three, Silicon Valley is represented in Congress by a progressive Indian American, Ro Khanna.) The growth in Indian-American tech involvement applies outside the executive suites as well. While Indian-born people make up about 1% of the US population, they make up 6% of the workforce in Silicon Valley.

The picture by the Manhattan Mercury newspaper in 1962, the year when Nasir Ahmed (on the left) met his future wife,
Esther Pariente (in the middle). Courtesy of Nasir Ahmed

At Honeywell, Ahmed was first exposed to a technique for analyzing digital signals known as Walsh functions. When he returned to academia several years later, first at Kansas State and then back at the University of New Mexico, he retained his fascination with these cutting-edge mathematical techniques. “We worked both late in the night and again early mornings,” his then collaborator Ram Moran Rao recalled. “That is when we developed mutual interests in the Walsh functions, Walsh transform, BIFORE transform, complex Hadamard transform and as well as the generalized discrete transforms, their fast algorithms and basically their applications.” Those exotic algorithms could serve many purposes: today, they are used in diverse fields that range from speech recognition to radio astronomy. But Ahmed was particularly interested in one specific application: using the tools of signal processing and analysis to reduce the file size of an image without sacrificing too much visual fidelity to the original.

To give a sense of the magnitude here, if the text file were a five-story building, the image file would be a skyscraper with 20,000 floors, higher than Mount Everest.

The fundamental problem with digital images is that they contain so much more information than text—at least as measured by the elemental digits of binary code. The ideas embedded in language are encoded at an incredibly efficient rate when we turn them into letters and words. Images, on the other hand, require far more data to do their magic. We all know the saying “a picture is worth a thousand words.” A text file that contains a thousand words is about 5K. An uncompressed photo taken with a high-resolution modern camera can be more than 20MB. To give a sense of the magnitude here, if the text file were a five-story building, the image file would be a skyscraper with 20,000 floors, higher than Mount Everest. A picture may or may not be worth a thousand words—but it definitely requires more information to share it.

A picture is worth a thousand words. Illustration: Agata Nowicka

Video is even more demanding. At 30 frames per second, just a few minutes of uncompressed video contains as many zeroes and ones as an entire library of books. And remember, all of that information needed to be sent over low-bandwidth dial-up modems back in the 1990s. Sending the data necessary to recreate a high-resolution photo on another person’s computer was like trying to empty a swimming pool through a drain the size of a drinking straw.

Performance of dial-up modems

The first “onramp” to the Internet for most ordinary consumers in the 1990s was a dial-up modem that plugged into an ordinary telephone line and transmitted digital information via sound. The modems were relatively inexpensive and had the advantage of working on existing communications infrastructure, but they had several disadvantages that ended up making them obsolete by the early 2000s. To begin with, they tied up your phone line while you were online, and they could take as long as a minute to make an initial connection to the Internet (or an online service like AOL.) But more importantly, they had an astonishingly low bandwidth by today’s standards. Even on one of the faster dial-up modems from the 1990s, downloading a two-hour HD movie would have taken about 200 hours.

All of which meant that the problem of shrinking image data down to manageable sizes was of vital importance if the Internet was ever going to evolve into a medium capable of sharing images and video. Like so much of the digital revolution, the idea of compressing data had its roots in the codebreakers and codemakers of World War II. Thanks to his work on encryption in the mid-1940s, the American mathematician Claude Shannon began thinking about how you could take messages—for instance, a secret military communication—and reduce its overall size, or transmit it over a noisy channel where the parts of the message might get lost en route to the recipient. Shannon was mostly thinking about text messages back then, but the general approach he outlined would ultimately apply to images as well. In a series of influential books and papers written in the late 40s, Shannon proposed the crucial distinction between “lossless” and “lossy” compression. In a lossless system, the original message is compressed before transmission and then expanded back to its original state by the recipient. Lossless is the ideal approach, but there tends to be a limit on just how much you can compress the file. But Shannon recognized that many messages can actually lose some information and still be legible to the recipient, thus enabling much higher compression rates. To give a very simple example, imagine a crude compression scheme that simply eliminated the fourth letter of every word in a sentence:

Com•ression appr•aches can be div•ded int• the two cat•gories of los•y and los•less.

Reading a sentence “compressed” in that fashion takes a little more effort, but the message is ultimately conveyed because your brain fills in the blanks. You don’t actually need the “o” in “approaches” or the second “i” in “divided” because those letters are effectively redundant—there’s no other word in the English language that matches the overall pattern of the word. What Ahmed was searching for in his deep dive into Walsh functions and complex Hadamard transform was a way of doing something comparable with the pixels that make up an image. Imagine a close-up photograph of your child’s face with a soft-focus backdrop of clear blue sky. In an uncompressed image, each pixel that corresponds to the sky will have the exact same informational “weight” as the pixels that define your child’s smile or the shape of their eyes. But if you could somehow just replace all those individual background pixels—each one carrying exact data about the specific shade of blue of that specific part of the image—with a blanket command to make the backdrop one uniform shade of blue, the image would look less realistic, but it would still convey the general feel of the original. But you couldn’t do the same transformation with the pixels that represent your child’s face. In the language that Shannon helped define, the face pixels are high in information and shouldn’t be compressed. The sky pixels are low in information and so can be compressed down without compromising much of the original image. In the language of signal processing, those regions are said to be low on “signal.”

Claude Shannon

Known as the "father of information theory," Claude Shannon was an American mathematician, electrical engineer, and cryptographer. In 1949, he published a groundbreaking paper entitled "A Mathematical Theory of Communication" that defined how we can think about communication. He used insights from his wartime research on fire-control systems and cryptography at Bell Lab because, as he described, “they [communication and cryptography] were so close together you couldn’t separate them.” Through his research, he essentially invented the field of information theory that fundamentally contributed to natural language processing and computational linguistics.

And so this was the challenge faced by the early pioneers of image compression: how do you create a mathematical formula that can tell the difference between high and low-information parts of an image? How can you teach a computer to tell the difference between a unique face and a uniform sky?

Grit and confidence to make a change

In the late 60s and early 70s, many brilliant mathematical minds were wrestling with the problem of compression algorithms. New approaches were being published in academic journals, but no clear winner had yet emerged. Much like the early days of the Web itself, it was a period of flux and experimentation, with many potential solutions in circulation.

“I had a strong intuition that I could find an efficient way to compress digital signal data. [...] To my surprise, the reviewers said the idea was too simple, so they rejected the proposal.”

“[By] 1972,” Ahmed says now, “I had a strong intuition that I could find an efficient way to compress digital signal data. The computers were huge and my PhD student T Natarajan was responsible for writing the computer program in the form of a deck of cards to process on a very large IBM computer!! In contrast, today’s technology enables one to do so on a laptop computer or even smaller device.” Ahmed was so excited about the idea that he immediately wrote up a grant proposal and sent it off to the National Science Foundation. “To my surprise,” Ahmed recalled, “the reviewers said the idea was too simple, so they rejected the proposal.” With that rejection, Ahmed faced a crossroads. He was convinced he was onto something powerful, something that would enable a truly quantitative approach to image compression. But he had no funding to support the research, and the alleged experts at the NSF had been unimpressed with the idea.

Nasir Ahmed and his wife, Esther Pariente © Gaston Bigio, GUT Agency

“University salaries covered only nine months,” Ahmed explains. “Hence, I had to move during the summer months with my family to different places in search of funds for our living expenses, as well as to get support for the university and my graduate students.” But because the new compression algorithm seemed so promising, Ahmed was tempted to break that routine. He consulted with his kitchen cabinet—namely his wife, Esther. “I have this intuition that this thing is the way to do it,” he told her. “But can we afford to take three months without any salary?” His wife replied that they’d figure out a way to manage, and Ahmed took the summer of 1972 to work on the idea without any funding.

Kitchen cabinet. Illustration: Agata Nowicka

The approach Ahmed ultimately hit open borrowed from the influential ideas of the 19th-century French mathematician Joseph Fourier: it turned the array of pixels that described an image digital into a waveform, effectively taking something spatial in nature and turning it into a frequency. Using techniques derived from Fourier’s original insights, the frequency pattern could then be described as the sum of a series of cosine functions. Ahmed realized that when you performed this kind of transformation on an image, the low-frequency waves corresponded to the “high information” regions of the image, while the high-frequency waves tended to represent the other end of the spectrum. (In our example image, the features of the child’s face would translate into low-frequency waves, while the blue sky would be high-frequency.) The procedure Ahmed invented—which he called Discrete Cosine Transformation—transformed the spatial image into a collection of waves, eliminated the low information waves, and then converted the whole file back into an image. It was a “lossy” form of compression, in Shannon’s phrasing. Whatever subtle textures or image grain might have been there in the background blue sky would be gone for good once you ran the image through DCT. But the most significant visual elements would remain—and the compressed file itself would be as much as ten times smaller than the original.

Testing the algorithm on actual computers was a laborious process in the early 1970s. “At that time, we were using [IBM mainframes] and the IBM computer cards to write the programs,” Rao recalled. “So we had to wait till the next day to receive our IBM cards and submit the program using the IBM cards.” But slowly, punch card experiment after punch card experiment, Ahmed began to think he had devised an approach that out-performed other compression schemes. At a scholarly conference on Walsh functions in New Orleans, Ahmed ran into a sometimes-collaborator from the University of Southern California named Harry Andrews and talked up his new approach. Andrews suggested a computer program that could test the efficiency of DCT, and later sent Ahmed a copy of the software so that he could run the test himself. When Ahmed sent Andrews the results of the software test, Andrews told him in no uncertain terms: “You need to publish this right away.”

Discrete Cosine Transform. Illustration: Agata Nowicka

In January of 1974, the IEEE Transactions On Computers published a short paper, with Ahmed as the lead author, with the simple title, “Discrete Cosine Transform.” To a non-specialist, the significance of the advance it described was difficult to discern. “It is shown that the discrete cosine transform can be used in the area of digital processing for the purposes of pattern recognition and Wiener filtering,” the paper’s abstract announced. But even an expert reading Ahmed’s paper at the time—even Ahmed himself, as it turns out—would have had a difficult time anticipating the long-term impact of the breakthrough.

It took more than a decade for DCT to reach critical mass. The turning point came during the late 1980s and early 1990s when a standards group known as the Joint Photography Experts Group made it a cornerstone of a new file format for photographs, with an acronym derived from the group’s name: JPEG. It was propitious timing: the Web added image support just a few years later, just as digital photo cameras started to become mainstream consumer items. Today, variations of DCT are embedded in dozens of different audio and video standards: in the MP3 standard that revolutionized the music business in the days of Napster; in the Dolby-Digital surround sound thundering through your home theater speakers; and in the video images that brought so many of us together during the days of the COVID lockdowns.

The turning point came during the late 1980s and early 1990s when a standards group known as the Joint Photography Experts Group made it a cornerstone of a new file format for photographs, with an acronym derived from the group’s name: JPEG.

DCT adoption

A discrete cosine transform is by far the most widely used linear transform in data compression and finds applications in areas from audio, image, and video compression, to cryptography, watermarking, surveillance, and medical technologies like Electrocardiography (ECG), vectorcardiography (VCG), and medical imaging. While JPEG was the image compression standard that brought DCT to wide adoption, it’s also used in the familiar file formats and applications like MPEG (aka Advanced Video Coding) which is the most common HD video format for Internet videos, YouTube, HDTV broadcasts, web browsers, streaming television, mobile devices, Netflix, video telephony, and FaceTime—or in Dolby Digital used in cinemas, streaming media, and video games.

“It took him almost 50 years to be recognized.”

A remarkable gift to us all

Among his peers, Ahmed’s work was widely admired; he went on to have a long and influential career as an academic scholar, and was eventually appointed as Dean of the Engineering school at the University of New Mexico. But unsurprisingly, Ahmed’s contribution was ignored by the vast majority of the users who benefited from the compression magic of DCT. It took a pandemic for Ahmed to finally begin to get the recognition he deserved: in early 2021, at the height of the COVID-Zoom era, the hit television series This Is Us aired an episode that featured the story of Ahmed’s DCT breakthrough, interspersed between stories where the show’s main characters use Zoom and FaceTime technology to keep in touch during two separate childbirths while remaining geographically distant.

Young Nasir Ahmed. Courtesy of Nasir Ahmed

This Is Us featuring Ahmed

A few months into the COVID pandemic, as the world was acclimating to the new communications platform of Zoom and FaceTime, This Is Us creator, Dan Fogelman, had the idea of writing an episode of This Is Us that would somehow integrate the story of one of the innovators behind these critical technologies. “Literally all of our little writers' brains googled computer programming and computer engineering to try to figure out who might've come up with the idea of FaceTime,” writer Vera Herbert recalled. “We came across the story of Nasir, who had invented the Discrete Cosine Transform, and we didn't honestly know what it was, but we did enough research to realize that it's the basis of this technology.” Ahmed and his wife are played by actors, but the final seconds of the episode feature a brief clip of the Zoom conversation between the actual couple and the show’s producers.

“To be honest… my first experience with the wonderful Zoom was when we were interviewed by the amazing and extremely patient folks [from] This Is Us.”

“It is a wonderful feeling I get when I see that the DCT is still being used in various image compression techniques,” Ahmed says. “To be honest… my first experience with the wonderful Zoom was when we were interviewed by the amazing and extremely patient folks [from] This Is Us.”
“It took him almost 50 years to be recognized,” Ahmed’s wife Esther said in a recent interview. “But what he appreciates the most is not the academic recognition. [It’s] the people, the teenagers—my own son is now telling me: ‘I’ve never thought about the impact my father had.’ To me, that’s a gem.”

“I had no idea it would become a major contribution. In those days, digital technology was just starting.”

While it was a powerful change of pace to see a network television program pay tribute to an otherwise obscure computer scientist, This Is Us did tweak the facts slightly to make the narrative more compelling. In the final scene with Ahmed, his young wife angrily asks why he is missing so much family time with her and their young son in pursuit of this obscure mathematical function. The fictional Ahmed then launches into an impassioned monologue, explaining that they were all headed towards a world where we will soon be communicating via video with our friends and family over the Internet, explaining that his algorithm will make all that possible. It turned out to be true, of course, so you can forgive the writers for using their dramatic license to have Ahmed play the role of tech visionary. But in reality, Ahmed acknowledges he was focused more on the immediate technical challenges in the early 1970s, not the long-term consequences. “I had no idea it would become a major contribution,” he told an interviewer last year. “In those days, digital technology was just starting.”

No doubt it is partly because of that singular focus—and the complexity of the math itself—that we tend to neglect pioneers like Nasir Ahmed when we put names in the pantheon of high-tech progress. He might have been better off in terms of his own name recognition if he had coupled the academic publications with bombastic pronouncements about where the future would take us, or left the university to lead a venture-backed startup. But sometimes the contribution we need as a society is not a wide-angle vision of the future. Sometimes, what we need is a compression algorithm that does its job reliably, and that knows which signals to preserve and which ones to leave behind. And we need that approach to be shared freely with the world, not locked away as a proprietary standard. That is the gift that Nasir Ahmed gave us.

The discrete algorithm that transformed the world

Ahmed’s story—both the geographic trajectory of his life as an early Indian-American tech innovator, and the long-term communications impact of Discrete Cosine Transform—is really a story about globalization, about the increasing connection between once remote people and places. If you were an American interested in Bangalore in 1960, when Ahmed emigrated to the U.S., you had virtually no way of experiencing it beyond traveling there in person, or perhaps tracking down a handful of books in the library. Today, you can just open your laptop and watch a video about Bangalore as a new high-tech hub or scroll through an endless collection of Instagram photos documenting life there—among thousands of other options.

Image society. Illustration: Agata Nowicka

Our extraordinary ability to see anything, anywhere in the world—and not just share text about it—is partly the result of Nasir Ahmed’s breakthrough idea about image compression from nearly half a century ago. Small files make for a small world.

That is the gift that Nasir Ahmed gave us.

Steven Johnson is the bestselling author of 13 books, including Where Ideas Come From. He’s the host of the PBS/BBC series Extra Life and How We Got to Now. He regularly contributes to The New York Times Magazine and has written for Wired, The Guardian, and The Wall Street Journal. His TED Talks on the history of innovation have been viewed more than ten million times.