MGA - What to do about SPAM

The MGA With An Attitude

MGAguru.com

WHAT TO DO ABOUT SPAM - in 6 min/day

SPAM -- "What can I do about this crap?" - SP-101

On the "Do Not Do" side is, don't let it get you depressed. We may have to live with it, but we don't have to be slaves to it, and it really doesn't take much time to deal with it effectively. I don't want to waste time talking about how you get the stuff or even how to TRY to avoid getting it, because we will all get it sooner or later. I'll get right to the chase and talk about how to handle it when you are stuck with it.

RECEIVING:
First of all, if you get any x-rated messages on your e-mail account, don't let any young children use your e-mail service. If your kids need or want to use e-mail get a separate e-mail account for them. And if that starts to get to much spam, change the e-mail address occasionally, and make the kids notify their friends of the new address. Unless there is some very specific reason, young children should not need a long term static web address.

For your personal e-mail you could do the same thing. When you start getting too much spam you can change your e- mail address, and you won't have much problem with spam for a while, until the spammers figure out your new address, and then you can change it again. This works if you only have a few friends to notify when you change your address. This is similar to having an unlisted phone number. Once too many undesirables know the number you have to change it again. But this is very limiting in only allowing messages from people to whom you actually give your address.

Another limited alternative is to use e-mail filters with a specific acceptance list. Since it is impossible to create a "black list" with every possible undesirable address that could be created, you can instead create a "white list" with addresses of only the people from whom you will accept messages. As limiting as this is, it can still fail if your friend's adress is on a spammers list. Spammers will nearly always forge the return address, and in some cases the address they use in the "From" line could be the address of one of your friends, so some spam messages could get past this method too. And again it is extremely limiting to allow messages from only the addresses you put on record.

So you may want step over the line and allow some messages from "unknown friends", such as if you were soliciting for customers from your web site, and you don't know who may be sending you the next message, but you still want to accept that message. Or if you like MGs you may be willing to accept a message from almost any other MG enthusiast, even people you don't know. Then you will have a publicly known e-mail address, and you may want to keep the same e-mail address indefinitely for the sake of the business. This is similar to maintaining a constant street address, constant phone number, or constant PO Box number. You may even be widely advertising your e-mail address. In due time you can count on getting increasing amounts of spam e-mail to a publicly exposed address like this, and it will never decrease, only increase with time, and you will have to learn to live with it. To live with this situation you will want to get a little crafty with filtering.

FILTERING:

In the beginning if you only get a few spam messages daily, you can just hit the Delete key, and it wouldn't bother you much. By the time you're getting 20 or 30 spam messages daily, mixed in with your good e-mail messages, you may find it difficult to always tell at first glance from the headers what is or isn't spam. Then you end up opening some of these messages just to find out if they are wanted or not, and by that time the spammer has won, and their open message is in front of your eyes before you can hit the Delete key. Of course the spammers are very crafty about this sort of thing, as they will forge the "From" address to attempt to look line a friend, or they will dummy up a subject line to look like it may be something of interest to you. When you have trouble identifying spam by the headers, then spam filters will also have the same problems and failures. When you have to look inside for the clues, then spam filters will also have to look inside to be successful.

There are certain commercial services that will (for a fee) filter your e-mail and weed out most of the spam before you ever see it. Some of these services could be quite good, and maybe you even get better service for a larger fee. If you're doing big business and getting lots of spam, maybe this would be worth the cost. But for personal use, or doing this for a car club or any other entity on a budget, maybe it would be better to avoid the fees. And there is another significant problem with pre-filtering before you download e-mail.

No e-mail filter will ever be perfect, and none will ever catch every spam message. Not only will it miss identifying some spam messages, but it could occasionally flag a friendly message as spam. This can happen when the filter uses rules based on slightly fuzzy logic, making a determination based on how much the message bears characteristics of classical spam. When a filter sees a message that it thinks is spam it will toss it into the bits bin. If you think too much spam is getting past the filter, you might squeeze down on the parameters used for the judging, and the filter may catch more spam and allow less to pass. But the tighter you squeeze the filtering the more likely it is to produce a false positive and start throwing away some of the good mail messages you really do want to receive. So you have to decide how important it is to you not to ever lose one of the good messages. To me, this is like sending an innocent person to execution occasionally just to be sure you are executing 90% of the crooks instead of 80%. I personally do not want to ever lose one of the friendly e-mail messages. So whether the filtering is done before or after downloading e-mail from the server, I still want to retain the ability to review what the filter is tossing into the bits bin. And if it turns out to be throwing away too much good mail I want to be able to change the filtering parameters accordingly. To this end you will have to retain some or all of the junk mail in a holding cell until you can personally review it. But then reviewing lots of spam e-mail is what you were trying to avoid in the first place. Grrrrrrrrr.

After lots of fiddling around with this stuff, I'm finding workable solutions. It is possible to filter e-mail into multiple classes. I have been doing well by filtering most of the junk into a class considered "Extremely likely to be spam" based on some general characteristics. By adjusting parameters properly this may catch about 80% of the spam without ever generating a false positive, or at least no more than one false positive for tens of thousands of rejected messages. For this class of spam I have no worry about tossing it into the bits bin with little or no personal review. Tightening the filtering a little more or using a different method can generate another class of messages considered "Most likely to be spam", out of which there may be an occasional false positive, perhaps less than 1% of all rejected messages. This class may catch 80% or more of the spam that was left after the first case. Personally scanning through the list of From and Subject lines allows deletion of most of these messages without looking inside, and the rest usually only require a momentary glance inside to verify deletion. This may only require a matter of seconds for review, and this is where a certain amount of experience with daily usage can allow fine tuning of the filter parameters based on how often you find a false positive.

Having taken out 80% of the spam on the first pass, and 80% of what's left on the second pass, you should be left with only a few spam messages daily requiring personal intervention. You do not necessarily want to delete these messages outright. What you do with the remaining spam messages that have sneaked past the filtering system will have much to do with reducing the problem of "sneaky" type spam in the future. For this you should allow perhaps 5 minutes daily, and the final result can be 100% of all spam messages being removed.

Of course this all sounds good, but now you want to know how it's actually done.

THE NUTS AND BOLTS:

Here's what I do for filtering. I use Eudora for the e-mail client, and I use it in Light mode, which is then fee-free and ad- free but unregistered and with no tech support. But no tech support is okay, because it works fine with no questions required.

First Pass Filtering:

The first filter package is Spamnix software, a plug-in for Eudora e-mail client on a PC, which is a spinoff of Spamassassin software for a Unix system. Spamnix is shareware that can be downloaded from the net for 30 day free trial. After that you can renew for another 30 days free. After that it badgers you with a pop-up box about every 30 minutes of use asking you to pay a fee for registration, but it doesn't stop working, so you don't necessarily have to pay the fee if you don't mind the occasional pop-up.

This filter system uses a variety of rules to check things in the e-mail message headers and in the body of the message. It assigns and adds up points (and fractional parts of points) for certain things found that are characteristic of spam messages. It may also subtract a point occasionally for some items that are characteristic of non-spam messages. When the total of the analysis adds up to a preset threshold (5 points by default), the message is identified as spam and transferred into a Spamnix mailbox. When this is done it also adds a trailer to the message displaying the causes and points showing why it was tagged as spam. This is highly educational as well as helpful to your cause. If you reduce the preset points threshold it will identify more messages as spam. Using the default 5 points it will take out about 80% of all spam messages, and I have never seen it generate a false positive in many tens of thousands of messages processed. After reviewing these messages through many thousands of rejections I am now confident enough to simply delete all of these messages without any personal review.

The lower you set the points allowance the more likely Spamnix is to produce a false positive and toss out a good message. Set to 4.0 or 4.5 it may occasionally do that, so I don't go there. If you want to fiddle with this a little, you could set it to 3.0 and watch it reject a few messages from your friends. Then you could send feedback to your friends telling them why their message looks like spam, and maybe advise them not to compose messages like that. In a similar manner you can figure out how to avoid having your own messages tagged as spam. In general, if you send friendly personal messages in plain text with not too many web links included, you can send some pretty large messages without them ever being tagged as spam.

Second Pass Filtering:

For the second filtering pass I use the programable filtering capability built into Eudora. Here you can specify words or phrases to search for in any of the individual headers or in the body of the message, and you can specify what to do with the identified message (copy to, transfer to, delete, etc). Once you have a reasonable assembly of filter rules selected from spam messages you have been receiving, most of the spam will be filtered out. The few spam messages that may remain as having sneaked through the existing filters are the ones for which you need to spend a minute to set a new filter rule.

From and Reply To: I very rarely reference the From or Reply To lines, because for spam messages this is nearly always forged. You could get many thousands of spam messages from the same source, and they might all have different information in the From line. In some cases this might even be the name or address of a friend. So in general, the only time I use the From line to identify a message for rejection is when I want to put a known sender (using correct identity) on my black list. This may not even be a true spam message, but just someone who irritates me, and I don't want to accept messages from them. The only other time I may use From for filtering is when there is no other means of replying or linking for contact in the body of the message.
To, Cc, Bcc, or Any Recipient: I also never use these header lines for spam sorting, because my mail server already disallows any message that is not addressed directly to me. Or in the case of the Club mail it could be addressed to any of a number of specified legitimate names, but no other address. I do have a few rules to use the To and Cc lines to move messages to other mailboxes to group them for other specific reasons (like all subscribed list mail for instance).
Subject: I do have a number of filter lines set to catch key words or phrases in the Subject line, but this is only about 25% of the total number of filtering rules. This is because many spammers will not put anything sensibly related to the message in the subject line, and will often use randomly selected subjects or nonsensical computer generated garbage.
Body: Ah, the final judgment day! Because the headers and subject are usually forged or non-sensical, the body of the message must contain some information as required for the recipient (you) to link to a web page (a linking URL) or to reply to the message (link to e-mail address) or to otherwise get in contact with the intended advertiser (phone or fax number). This is so, because no matter how cheap and easy it is to send spam, very few people will ever do it without a cause to which they need your reply. So about 2/3 my filter rules are based on things found in the body of the message. These things are (usually) not hard to find.

So What is in the Body?

Many spam messages will contain HTML code (like a web page). You can view the e-mail HTML code page the same as you can with a web page, right-click or control-click and "view source". If there is no HTML code, then what you see in the body of the message is what you get, and you pick your filtering phrase from there. When there is HTML code, it will likely look quite different from what you see in the visible body of the message, having functional HTML code tags all over the place. But not to be discouraged, as what you need will be in there somewhere.

Start by searching for occurrences of "href", which is the code for a hyper reference connecting to a URL (something located on the web). You would want the one(s) that connect to a web page, similar to this example:
<A href="http://www.specialdomain.com/ads/comehere.html">.
This is where the browser would go if you actually clicked on the link in the spam message, and this you can use for the filter. Most of the time you only need to filter on "specialdomain.com", which is more general and will trash any message linking to that domain. If you're not sure, you might take the time to put "http://specialdomain.com" into a browser and actually look at the index page of that web site to be sure that it is a domain from which you want to reject everything.

If there is no web page link found, then search for "mailto", which will reveal any link to an e-mail address similar to this:
<A href="mailto:comehere@specialdomain.com">.
This is what would call your e-mail client to send an e-mail message if you were to click on this link in the spam message, and this too you can use for the filter. In these cases you may only need a small part of the phrase for the identifying filter, such as "specialdomain.com" or "comehere@". Once you have done this a few times you should be able to find, copy and paste this tidbit into a new filter rule in no more than one minute. (A little later, possibly in a separate message, I may chat more about a few tougher cases where the spammers try to break up words with comment marks and disguise URLs by using ASCII code in place of alpha characters).

If there are no links to web URLs or e-mail prompts, then forget the HTML code and look in the body of the message for a phone number or fax number which you can use for the filter. Somewhere in nearly every spam message there will be a web link or some form of communication contact number that you can use for the filter.

If you should happen to receive a totally blank message, this may be an attempt by a spammer to verify that your e-mail address is active by looking at bounce messages from non-functional addresses. Like most spam messages, there may be little you can do to stop these from coming, but you may be able to set a filter for them anyway. Most spam messages have forged From and Return To addresses, but for this intent the sender would have to use a legitimate From: or ReplyTo: address in order to receive a bounce message. So if the From line looks like it might be a valid e-mail address you could use that for the filter. Otherwise, totally blank messages are moderately rare and easy to spot and delete. If the body of the message is totally blank, the message size may show as "0", in which case you can toss it immediately into the bits bin.

How Low can you Go?

The success rate of this second pass filtering is dependent on having a moderately large set of filter rules, perhaps 300 lines derived from you incoming spam messages. So your initial setup of this filter set will take some time and patience, as you will essentially be looking at and setting filter rules for some 300 spam messages. If it takes an average of one minute for each filter rule you are composing, then it figures that this will take up to 5 hours to accomplish, but it could be spread over a fairly long period of calendar time depending on the volume of spam messages you receive. By the time you have the first 100 filter rules set it may be catching 1/3 to 1/2 of the incoming spam messages not trapped by the first pass filter. With 300 filter rules set it may be catching up to 2/3 or more of the second pass spam. After that, further improvement will continue to be gained by persistently making a new filter rule for virtually every spam message that gets through. With time the number of sneakers will diminish to a surprisingly small level, at least as a percentage of the total spam volume. The reason this works as well as it does is because many spammers send repeat messages containing that same links, or links using the same primary domain. Also if you are tending multiple e-mail accounts you may notice that many identical spam messages are sent to lots of different e-mail addresses.

Here is a review of the results I have been getting with this system. For my personal e-mail address, which was new 15 months prior to writing this report, I receive 100 to 150 total messages daily, about 2/3 being spam messages. My filter list has grown to about 1000 entries, and now only 1 or 2 spam messages sneak past the filters each day.

For our club web server we have more than 20 legitimate e-mail addresses with 8 of those being forward to me as editor and webmaster. These are all long standing e-mail addresses in the public domain, and all are published on the club web site in various places (some more than others). The club web site now has several hundred web pages including a 6 year archive of monthly newsletters. This site is currently getting more than 6000 browser visits weekly from more then 4000 unique addresses. The key point here is that the 8 club e-mail address which I tend are currently averaging nearly 500 spam messages daily, mixed in with only a dozen or so friendly messages each day. My second pass filter list now has more than 3000 records and is very effective indeed. Statistical daily average activity is someting like this:

500 spam messages total (each day)
366 spam caught by 1st pass filter = 73.2%
123 spam caught by 2nd pass filter = 24.6% (92% of those escaping the 1st pass)
9 spam needing personal attention = 1.8%

So even though 3000 filter records may have required up to 50 hours to accumulate over the past couple of years, my time spent tending to spam e-mail here is now only about 10 minutes daily, all used for making new filter rules. And after that 10 minutes of personal attention all 500 spam messages have been banished, with no false positives. Hopefully this form of filtering will be able to keep up with any increase in spam volume for a while, as I could likely tolerate up to 1000 spam messages daily. We are talking about banishing over 180,000 spam messages per year (and still likely growing). Sometime soon I hope to put the Spamasassin program and a substantial part of my filter list on the server so that most of the garbage will be tossed in the bits bin before I have to download all those messages. Here's hoping your personal spam problem is a lot smaller, and I hope this may help you to better understand and manage it.