Paul Wouters's (paul@xtdnet.nl) personal spam statistics 1997-2004
Total amount of spams received in my life as of January 1st, 2005: 141.329 spams


Most of the current graphs
are made with Mail::Graph, which can be downloaded through CPAN. Some are
made manually with gnuplot.
Eight years of spam
I started putting up these graphs to show people why exactly spam (or UCE) is
such a very bad thing. Sadly, I was right. I only wish people had realised
this a few years ago. The result is now very disheartening:
- On januari 1st 2005 00:00, I had received 142.609 spams in my entire life.
- I am currently receiving over 5000 spams/month.
- On januari 1st 2003 I was "only" recieving 35 spams/day.
- On januari 1st 2004 this had increased to 355 spams/day.
- This year's increase exceeds Moore's law.
- In fact, since 1996, spam has been roughly an exponential problem for me.
- Even if I only linearly project the total amount of spams for 2004 based on the current statistics, I will receive as much spam in 2004 as in the last 7 years. Ofcourse, since we know spam is exponential, it will be much much more :( (update 2005: In fact, it was twice as much)
All spams available
All the spam I have received are available verbatim in my spam archive.
Unix mbox format of all my spam is available upon request, with proper motivation.
Note that some spam was directly addressed to me, while others
got to me because I'm the "postmaster" for various domains.
Also,
during 1999 I enabled the RBL
list on top of the previous anti-spam meassures. In 2000 we switched
off the RBL, after many RBL's with weird policies appeared and moved
to a tagging only system. We are currently using SpamAssassin to mark
potential spam as such, which uses various blacklists through dns. We
also use a virus scanner to remove viri.
2003 marks the year where I had to give up checking my spam folder for false
positives, it just became undoable.
If someone knows what happened in september 2004 that explains the decrease in
spam, I would be very interested. Perhaps this is related to a software
upgrade (eg spamassassin), but if so, then I cannot remember it.
2005 marks the year where I have given up collecting spam. The distinction
between a virus, a bounce, an error, or a piece of spam has become
inseperable, and I think by now my original point has been made clear.
 |
- my old xs4all/hacktic spam
- 1997 (541 spams)
- 1998 (661 spams)
- 1999 (598 spams)
- 2000 (1230 spams)
- 2001 (1629 spams)
- 2002 (7796 spams)
- 2003 (33469 spams)
- 2004 (95405 spams)
<--- Clicking this link will likely kill your browser!
- Total amount of spams received in my life as of 01-01-2005 00:00 is 141.329 spams
|
(discrepancies in numbers are due to the difference in processing email between Mail::Graph and
Hypermail. Hypermail drops messages with the same message-id, and apparently some spamruns happen
multiple times with the same message-id. Therefor Hypermail sees 95386 spams in 2004 while Mail::Graph
sees 96882 spams))
Effect of Versign Wildcard and SoBig virus
The graph below is a closeup of the last few months, where a few major
things happened. The closure of some blacklist DNS services, the SoBig
virus, and the Verisign Wildcard issue.
The big peak on August 26 is the result of the SoBig virus. Especially
interesting tidbit is that we never return to before-Sobig
levels! Therefor one has to conclude that the last few months of insane
spamlevels are mostly due to SoBig-infected machines still running and
spamming. I do not see any noticable spam increase as a result of the
Verisign Wildcard. That does NOT mean I think the wildcard is a
good idea! I am strongly opposed to Verisign's stupid wildcard idea to
spam people who make a typo with their ad-driven search engine!
Viri statistics
NOTE: I have two DVDs full of viri collected over the years until 31-12-2004, but they
still need to be processed. Any volunteers?
Since the SoBig virus went beyond any previous levels, it flattened out the
virus statistics completely. Below is the graph which is topped at 200 viri/day.
Here is the unlimited graph, including the first days of MyDoom/Novarg
SpamAssassin
I am receiving regular comments, feedback and threats. Some
from people who have now seen the light about their past mistake and
apologised, some from people who asked me how to improve their search
engine hit to get more visitors (duh!) and the most amusing ones are
the legal threats people have sent me. Some even try to use copyright
infringement as the base of their threat :) These people invariable
never contact me again after I inform them all my email (including
their complaint) and any public and/or legal documents arising out of
a lawsuit are going to be public and if needed put on this same page as
well. Ofcourse, if you appear on these pages, and this is a mistake or
otherwise misleading information about you or your company, feel free
to contact me with more information, and if appropriate, your spam will
be anonimised.
The UEFF
2003 was also the year of the attack of the "United Email Freedom Front" (I bet these guys never saw the Life of Brian, or they wouldn't pick that name). During April they
threatened to DDOS me off the net if I didn't remove my spam archive. Ofcourse
I didn't comply, and went through a few DDOS attacks. The archive survived, and
is still online. Though the motivation of this group/person was not clear,
there are a few hints that it might have a relationship to the MegaMania
'pump and dump' stock-fraud operation. But it gave me some nice "spammer statistics". As a result of those spamruns forged in my name, we received:
- Total emails: 3500
- Bounces/double bounces: 2500
- Remove me / Unsubscribe requests: 200
- Out of Office replies: 100
- Over quota replies: 100
- "personal" answers: 700
Now let's assume that 25% of the addresses used in the spamruns were valid
email addresses. That would mean that the spamrun size would be 10.000. Of
those, 400 people WOULD HAVE GIVEN their address to the spammers in some way.
That would be 4 percent! We also had to block our info@ address for a while,
and blocked over 55.000 attempts to mail us. Where our website normally serves
about 10.000 hits/day, it was doing between 180.000 and 300.000 hits/day
during the attack.
If anyone has any real statistics on validity of
addresses in an average run, I would be intrested in those. Rejo Zengers of
Spamvrij, a Dutch anti-spam organisation
analysed a spamCD which
he received from a spammer.
See also: The UEFF page
Spamming is profitable
And for all those people who wonder why spammers do it, the reason is
obviously very simple:
Anonymous pornsite webserver hits since they started spamvertising

Please remember, make spam unprofitable. NEVER buy anything that has been
spamvertised. If you do, you are PART OF THE PROBLEM.
Conclusion
The only conclusion I can reach, after interpreting spam related events
that have happened to me in the last year, is that spam has become a
matter of organised crime. It should be cracked down. It also makes me
look in fear at VOIP, especially if combined with ENUM. If we don't
design it to be spamproof, we will not only lose email, we will also
lose the usefulness of our mobile phones!
Most popular topics by spam
To end with a more happy thought, let me provide you with the most popular
search engine phrases I have received as a result of publishing all my spams
verbatim on the web, Google indexing them, and people using Google to find them.
One month stood out in which one search word ruled the statistics: Xanax in august.
For entries which occured in the top 20 at least in 5 months, the (absolute) total becomes:
| rank | search term | # of hits |
| 1 | (free) naughty cards | 8061 |
| 2 | r | 3167 |
| 3 | tonya harding (wedding) sex tape/video | 2722 |
| 4 | el*mono*mario | 2370 |
| 5 | nikole kidman | 2271 |
| 6 | (free) ps2 games (download) | 2174 |
| 7 | nip slips | 1793 |
| 8 | amandacam | 1763 |
| 9 | princess diana pictures | 1511 |
| 10 | disney x | 1316 |
| 11 | milf | 1164 |
| 12 | see through clothes | 945 |
| 13 | rüyatabirler | 810 |
| 14 | carol shelby | 698 |
| 15 | angelina jolie | 630 |