Built-in and add-on spam busters—programs designed to work with your e-mail program to eliminate spam—go about determining what is and isn’t spam in a variety of ways. They may include some or all of the following:
Whitelists
A common antispam practice is to add the contacts in your e-mail program to a list of approved senders. These whitelists include the addresses of the people you presumably welcome e-mail from. This can be an effective technique, except when a whitelist contact’s address is spoofed, or impersonated, by a spammer. For example, it’s not uncommon to receive spam supposedly sent from your own e-mail address. Unless a whitelist feature is designed to remove addresses that are later designated as spam, the whitelist will let these messages through.
Blacklists
Sometimes called blocklists, blacklists are the opposite of whitelists—they’re lists of unwelcome addresses. When you train a spam utility, you typically feed it multiple bad e-mail messages. Often, the senders of these messages are added to a blacklist that the spam utility uses to help sort your mail. Blacklists are also assembled and maintained by a variety of groups on the Internet. Among the most popular is the Spamhaus Block List—a database of IP addresses of verified spam sources. Most Mac spam utilities don’t check such lists, but the more advanced spam filters used by ISPs often do.
Challenge-response systems
While not technically a spam filter, a challenge-response system is a tool some people use to fight unwanted mail. An e-mail account equipped with a challenge-response system demands that unknown senders confirm their identity by responding to a challenge (for instance, going to a Web site and manually typing a series of letters and/or numbers). Once the response is accepted, the sender is added to the recipient’s whitelist. Though effective, a challenge-response system makes your e-mail problem everyone else’s problem by forcing legitimate correspondents to jump through hoops. Many people who have good reason to communicate with you will refuse to do so when confronted with a challenge-response system.
Regular expressions
A text string that includes wildcards (symbols that represent other characters) that help identify text patterns is called a regular expression. For example, a regular expression could identify the many intentional misspellings of the male sexual-performance enhancement drugs Cialis and Viagra. Spam utilities often use regular expressions to add addresses to whitelists and blacklists.
Reverse DNS filters
By looking for valid information in a message header, reverse DNS (or rDNS) filters determine whether a message came from a viable Internet address. If the header is missing or bears a generic rDNS name, for example, the message is deemed spam.
Statistical learning filters
Some filters judge e-mail messages by examining the kind and frequency of the words within them. The most common examples of statistical learning filters are Bayesian and Latent Semantic Analysis (LSA). Such filters understand that, in most cases, a message that contains repeated instances of the word Viagra is spam. But suppose you’re in the pharmaceutical business and routinely find this word in legitimate messages. The beauty of statistical filtering is that, because you tell the filter what is and isn’t spam by training it with both good and bad messages, it bases judgments on the e-mail you typically receive rather than using a stock list of words.
The other advantage of statistical learning filtering is that it’s adaptable. If some new wonder drug or financial scam comes along, you provide the filter with a couple of examples of spam messages about it, and future instances and variations of the message will be filtered out. Nearly all modern spam utilities employ some form of statistical learning filtering. Developers often term these filters Bayesian because of the general way they operate, but the Bayesian filter in one program may work differently from the Bayesian filter in another. The key to success with Bayesian filtering is teaching. If you fail to correct the filter when it makes a mistake— when it terms good messages bad, or bad messages good—it becomes less accurate.
[ Christopher Breen is a senior editor at Macworld.]