Who can tell when they’re living through history? On April 12, 1994, people who logged on to their Internet providers and checked Usenet (which, in the days before widespread adoption of the Web and bulletin board websites, was the preeminent means of mass communication on the Internet). In addition to the usual — the cranks on sci.physics, the discussion of future Squiddy winners on rec.arts.comics, the bizarre talk on talk.bizarre — people had an exciting message awaiting them: "THE DEADLINE HAS BEEN ANNOUNCED". An Arizona law firm, Canter & Siegel, had targeted an ad at anyone on the Internet trying to enter the American green card lottery. Whether this service was useful was beside the point; Canter & Siegel had figured out how to abuse the system to contact hundreds of thousands of potential customers. As one somewhat bewildered user reported, "Everywhere you went, it was Green Card, Green Card, Green Card." The green card lawyers had, for all intents and purposes, invented spam. The first usage of "spam" is disputed, but the term obviously comes from a Monty Python sketch rather than the potted meat product so popular in Hawaii. The actual first piece of spam — defined broadly as unsolicited messages sent in bulk — seems to have been sent that January by someone who crossposted it to dozens of unrelated newsgroups. After the green card spam, Canter and Spiegel became possibly the most hated people in the Internet, and their ISP yanked their access, but the genie was out of the bottle. As Usenet withered thanks to the rise of the web (and the flood of spam that made newsgroups unreadable without a well-configured killfile), spammers moved to sending bulk email, and they did so with a vengeance. Back in the day, bulk email wasn’t such a problem, but with spammers regularly owning lists of 150 million addresses, today over half of email coming across the wires is spam; it costs companies millions in bandwidth charges, although since credible reports have spammers earning $100,000 a year and up, nobody sees spam going away anytime soon.
Lisp guru Paul Graham came up with a means of stopping spam in 2002; he wrote a brief explanation, "A Plan for Spam", suggesting the use of Bayesian analysis, in which certain words and patterns are given a probability of correlating to good and bad messages, to sort incoming mail into likely spam and likely "ham". Graham’s suggestion led to SpamAssassin and a whole slew of other Bayesian spam filters that could be trained to learn that "Viagra" and "CLICK HERE" were likely signs of unwanted mail. Spammers fought back, including dozens of random words in their messages to make filters think their mail is hammy and going through increasingly convoluted steps to disguise the words that indicate spam. Anti-spammers have come up with a variety of next-generation proposals: "hashcash", in which a calculation costing a trivial but measurable amount of processor time must be performed before mail is delivered (not a problem for mail to your boss; a problem for fifteen million Nigerian bank director announcements); greylisting, in which spammers’ failure to follow the protocol for mail relayers is used to block the mail they send; or CRM114, a Bayesian filter combined with a regular expression parser to learn some of the bizarre circumlocutions that spammers use.
These methods will probably work, at least for a while. Spammers will find a workaround (possibly one using the legion of zombie servers, home machines commandeered for their use, that they’ve built up over the past few years). Some bright people, out of irritation at their inbox being full or a desire to make some money, will come up with a workaround for the workaround, and the arms race will continue. Meanwhile, the brave new worlds of SMS spam and IM spam have begun to open up. Anil Dash recently observed what happens when AOL decided to include CDs in copies of the Village Voice:
We all knew that after the apocalpyse it’d just be cockroaches and televangelists and some militia members camped out in bomb shelters in Montana. But I have yet to see a sci-fi writer who correctly predicted that the twentyfirst century would come along and we’d all be literally walking on discarded piles of digital recordings that promised us the ability to instantly connect with anyone in the world, free of charge. But hey, whadaya know. I’ve got mail.
It may not do my inbox any good, but it wouldn’t be the twenty-first century if there weren’t shady characters on the fringes of society breaking the law in cyberspace for ideology and profit. There’s a shadowy war going on between the forces of capitalist rebellion and society’s self-appointed defenders! Zombie server farms! Virus writers targeting millions of computers! Russian black market credit card lists! Black ice! Things never work out the way one expects (who sees themselves in the role of repressive defender of the status quo?), but if only the spammers would wear mirrorshades, science fiction’s predictions would almost have come true. The future is now.