Tricks and technologies for dealing with spam
The first rule of spam-stoppers is that no solution is perfect, but
some are more imperfect than others. With that in mind, here are some
thoughts on the state of the art.
Home : Linux
resources : Antispam
Current spam rate
My server's "spam rate" as of April 2009 is nearly 6000 messages per
day, of which some 350 are held by TMDA (see below),
and the rest are for nonworking addresses that bounce. The rate seems
to hold steady for three to six months, and then suddenly doubles; in
December 2004 it was only around 100 messages per day with 15 blocked by
TMDA. At most, only one or two spam messages per month actually reaches
my inbox; usually, these use a return address that points to a
badly-configured autoresponder that responds to bounces, which TMDA
accepts as a confirmation.
These I immediately blacklist.
This "spam-rate" computation is based on recording all
double-bounces, which happen when the destination address was invalid so
a bounce was generated, but the bounce itself was also undeliverable.
This can happen directly, as when a spammer sends to a non-existent
recipient address with a non-existent return address, or indirectly via
TMDA. When TMDA sends a confirmation request, it looks like a bounce to
the mail system, so if the original sender address is bogus, the
confirmation gets double-bounced straightaway. Note that this also
includes messages destined for old throwaway addresses that get spammed
frequently (that's why I threw them away); I send these straight into
the double-bounce logger, since that reduces the load on my server.
This also may include a few random probes from viruses, though not
from one in particular that sends huge volumes of probes. This virus
(possibly
W32.Klez.E@mm) sends emails that purport to come from
inet@microsoft.com, which at least makes it easy to eliminate.
Astonishingly, this virus guesses destination email addresses at random,
and may send hundreds of random emails over the course of a day before
it gives up (or somebody wonders why their computer is so slow). So
far, the virus' success rate at my site has been zero, probably due in
part to the fact that I use the qmail
mail transport agent, which doesn't check whether the address is
good at the time it accepts the message; this probably slows the virus
down. It may be more effective when it tries to guess addresses at
yahoo.com.
What I do now
In order to cut the flow of spam to an acceptable trickle, I use a
combination of methods:
- Disposable addresses
- I have long used special-purpose email addresses to subscribe to
mailing lists; when the temporary address starts getting spammed,
I change to a new address. The
qmail MTA makes this easy, by giving me an infinite
supply of addresses I can use.
- Whitelisting with TMDA
-
The Tagged Message Delivery Agent
(TMDA) is usually configured to accept email only from
specified "whitelist" addresses, never from "blacklist"
addresses, and only after confirmation from all others. See my
Why I use TMDA to reduce spam page
for why this is so effective.
- Tagging with
Spamassassin
- SpamAssassin is
the best-known example in the "content filtering" category. It
looks for telltale features of spam in the message headers and
body, and produces a "spam score"; if over a certain threshold,
the message is considered spam. As mentioned below, I only use
Spamassassin to suppress challenges for probable spam to a valid
address, so correspondents who are whitelisted need not worry
about mentioning anything (like spam) that might get their
message discarded. Not only does this reduce the outgoing load
on my server, it reduces the chance of adding to someone's
backscatter misery. And TMDA still holds on to the message for
two weeks, so I have a chance to look for it if I suspect
something is missing.
Discarded solutions
Here are some things I've tried and discarded as ineffective or
impractical:
- MIME-type filtering.
- I used to filter out 95% of all spam by disallowing HTML email
and attachments, and provided a separate address for people who
wanted to send me attachments, which I kept hidden. (This was
even more effective for viruses, which always have attachments
these days.) But this had several flaws:
- The world is full of people who won't turn off HTML when
sending a message, either because they can't be bothered,
or they haven't a clue. (I suspect most of the latter
don't even know that their mail program sends HTML
messages in the first place. If you're one of those, you
should read about the ASCII Ribbon
Campaign (among other pages) for why HTML email is a
poor idea, and then How to Turn
Off HTML in Your Outgoing Mail Messages page.)
- A few such people had used my "hidden" address as my
primary email address in their address book, increasing
the chance that it would be disclosed to spammers.
- By December 2003, the total volume of spam had increased
such that stopping 95% of it wasn't good enough.
That's when I decided I had to switch to TMDA.
- Traditional content-based filtering.
- The way SpamAssassin is
traditionally used is to redirect messages based on the "spam
score"; if over a certain threshold, the message is considered
spam, and is sent to the recipient's quarantine folder.
I tried this at work, using
SpamAssassin to tag messages with "**** SPAM" in
the "Subject:" line, but it wasn't very effective, though that
may be at least in part because the database of spam signatures
was not as up-to-date as it could have been. I set it up for
myself to test it without actually blocking anything; for the
office administrator, we configured her mail reader to quarantine
any messages with this tag. Then one day, the office
administrator sent me an email with "IMPORTANT!!!!" in the
subject and the body in all-caps. Not surprisingly, SpamAssassin
tagged this as spam, which was bad enough; what was worse was
when the admin didn't see my reply, because the subject still had
"**** SPAM" in it and the message got quarantined!
I have not tried any of the commercial spam-filtering solutions,
because they all seem to consist of some kind of content-based
filtering, usually in combination with proprietary technologies. If the
proprietary technologies were good enough, they wouldn't need the
content filtering, so I'm not inclined to spend the money for what must
be an incremental improvement. (True, it would be equivalent to hiring
somebody else to maintain the signature database, so it would be more
current and therefore more effective, but I'm leery of false positives.)
The future of antispam
There are several other proposals for stemming the flood of spam, but
they tend to be longer term. They consist of changes to the
infrastructure that make it easier to detect forgeries, but they won't
have much impact until at least one of them is widely implemented.
- DomainKeys
- This is a Yahoo
proposal for authenticating the sending domain (but not the
user). If an email purported to come from a sending domain whose
advertised policy is that it always signs outgoing email,
and the signature was missing or invalid, then the email is
clearly a forgery, and can be dropped or quarantined. Once this
is widely deployed, it could be quite effective; there are ways
to defeat it, but it would be prohibitively expensive for large
mass-mailing.
- ???
- [There was another similar proposal I read earlier, but I can't
remember where now. -- rgr, 10-Dec-04.]
Bob Rogers
<rogers@rgrjr.dyndns.org>
$Id: antispam.html 267 2010-03-23 01:03:57Z rogers $