Jump to content
MDDHosting Forums

Taking control of your spam


Recommended Posts

After a few exchanges with Michael about a minor issue I was having, the topic came up of how I handle my spam and I mentioned that I am pretty involved in how I deal with it. He asked if I could add something to this site in order to help others, so here it is.

 

While I have tried to make every effort as to its accuracy, I am not responsible for any problems or damages that may occur.

 

I hope some of you find this helpful!

 

(I had these nicely formatted in Word but lost it all when pasting into this forum. I will try to straighten them out a bit and make them easier to read!)

Link to comment
Share on other sites

Taking control of your spam: Part 1 - Sorting by spam level

Written by Kevin Dommer

 

Although cPanel allows some basic configuration of SpamAssassin to help you handle your spam, it is a bit limited. I’ve seen some cPanel demos around the web and some seem to have features and options available that others do not. I’m not sure about the discrepancy, but this post aims to help other cPanel users here at MDD hosting take better control of their spam. For example, I have seen a “Spam Box” option in some cPanel demos but I don’t have this option in my cPanel with MDD. But not to worry, because you can still set up a “spam box”, and that is just one of the ways you can help sort your spam.

 

I am going to try to write this in as clear a way as possible, but there is only so much you can do with a technical document and I am not a technical writer.

 

Before I begin, the examples I will give are just a guideline. There are several ways that one can achieve handling of spam. Some ways are easier, some are harder. I’ve been using SpamAssassin for around 9 years at the time of this writing and this is how I have come to prefer handling my spam. Over the years I have gathered bits of information from various online sources on how to do some of these things, and some of it I have come up with myself through experimentation. You can follow this to the letter or change things a bit to suit your own needs or tastes.

 

This tutorial is presented with some assumptions as listed below. I am not going to go into too much detail about these things because they are something that should be understood to a certain degree if you decide you want to follow this tutorial. In some cases my instructions may be enough even if you are treading in new territory but prior knowledge or experience in these things may help.

 

ASSUMPTIONS:

1) You know what SpamAssassin is and have a basic idea of how it works (rule-based spam filtering which assigns a spam score to every email it scans). For more information, see the official SpamAssassin website at http://spamassassin.apache.org.

2) You are familiar with cPanel and know how to get around in it.

3) You know how to access and edit plain text files on your site (whether through cPanel’s File Manager or by using your favorite FTP application). This will come into play more for bayes training.

4) You know how to set up email accounts on your site.

5) You know how to access the email in your accounts (both from your PC and through Webmail).

6) You understand that this is merely a guide and no guarantees are made as to the results you will receive. Incorrect or careless settings could possibly result in missing emails! MDD Hosting or I assume no liability for any problems or damages that may arise from following this guide.

7) The settings I use are for a single domain (no sub-domain) and although we have several email addresses, they are all for a single household and therefore we have no issues with privacy in the sense that we don’t mind if someone else accidentally sees an email not intended for them. (This can happen when going through a “spam box” and when training SpamAssassin’s Bayesian database in part 2). There are steps you can take to minimize this, but it is likely to happen at some point. This tutorial is not meant for resellers or managers of websites with various clients who have individual (private) email accounts through your website. I will include some additional info on how to maintain a bit of privacy while still training your bayes database.

8) Advanced settings and adjustments like this are outside of the normal scope of MDD Hosting support. While they MIGHT help with minor issues, they should not be expected to do so.

 

Setting up your multi-level spam handling

The goal:

1) We will designate a minimum score for SpamAssassin to mark an email as spam. 2) We will designate a slightly higher score and set up a filter to send those messages to another mailbox (so they do not come into your inbox every time you check your email). 3) We will designate an even higher score and set up another filter to send those messages to yet another mailbox, with the eventual goal being that those very high scoring spam mails will be deleted without you ever having to look at them (optional but recommended after taking some time to verify your settings).

 

1) Set the number of hits required before a mail is considered spam. I believe the default is 5. Over time you may want to adjust this number up or down. For our purposes here, this number is not critical, but ideally it will only flag a message as spam if it really is in fact spam.

a) In cPanel, click on the SpamAssassin shortcut

B) Click the ‘Configure SpamAssassin’ button

c) For required_score, enter 5

d) Click ‘Save’ at the bottom to save the changes

 

2) Back in the SpamAssassin configuration page you will notice that there is a Spam Auto Delete option (and you may have a Spam Box option depending on your host provider if you are reading this from a provider outside of MDD Hosting). While at some point the plan is to use this feature, I do not recommend setting it here! We will do this manually later as it will give us more control, and even after that I do not recommend changing the settings here because it can cause problems with the filters we will create. In short, do not change any of these ‘Spam Auto Delete’ settings or buttons. With that warning out of the way, let’s set up some new email accounts and some account-level mail filters!

a) Back at your main cPanel page, click on the Email Accounts shortcut

B) Create a new email account called spam (you can use any name you want, but I will refer to it as spam in this tutorial). This is where our mid-range scoring spam will get redirected to.

c) Create another new email account called spam2 (you can use any name you want, but I will refer to it as spam2 in this tutorial). This is where our high scoring spam will get redirected to, and the goal is to eventually delete this account and have the emails being sent here get immediately deleted instead. Emails scoring this high are always spam, so straight to the trash they will go (eventually)!

d) Back at your main cPanel page, click on the Account Level Filtering shortcut

e) NOTE: If you already have other account level filters in place, you will need to decide in what order you want them processed. The order they are listed in the filter list is the order that they are processed when your mail comes in. Incorrect filter order can cause unexpected results! I have no way of knowing what other filters you might have so use your best judgment, but generally speaking I would think that you would want these new spam filters first. We will assign numbers to the beginning of our new filters to help remind us what order they need to be in (very important). cPanel always puts the last filter you edit at the end of the list. If they end up out of order, go back to a filter you want to move down and edit then activate it to move it down.

f) Click the ‘Create a New Filter’ button. This will be our filter for very high spam. Initially you want to set this pretty high (we will start with 15), but you will bring this down quite a bit over time.

1) Filter Name: #1: Spam High

2) Rules: [spam Bar] [Contains] +++++++++++++++ (note that is 15 + signs)

3) Actions: [Deliver to folder]

4) Click the dropdown box that appears, click the + sign next to your domain name, then click on Spam2. The box should then say /yoursite.com/spam2

5) Click the [+] button off to the right to add another action.

6) For the second action, choose [stop Processing Rules]. If you don’t do this, then high spam will be caught again in the next filter and routed to your mid-range spam box rather than the high spam box. We don’t want that to happen!

7) Click Activate to activate the filter then click to go back to the main filter page.

g) Click the ‘Create a New Filter’ button again. This will be our filter for mid-range spam. This should be a number higher than the minimum spam score you set earlier, but not too much higher. If you used the default of 5 earlier, maybe set this to 7 (+++++++). The idea is that any spam below this number (spam score of 6.9 or lower) will go to your inbox as normal because it may not really be spam and you don’t want to miss it. Spam with a score between this number and your “high” number is most likely spam but we can’t really be sure, so we will redirect it to our mid-range spam box and check it periodically.

1) Filter Name: #2: Spam Mid

2) Rules: [spam Bar] [Contains] +++++++ (note that is 7 + signs)

3) Actions: [Deliver to folder]

4) Click the dropdown box that appears, click the + sign next to your domain name, then click on Spam. The box should then say /yoursite.com/spam

5) Click Activate to activate the filter then click to go back to the main filter page.

h) Now back on your main filter page, be sure that it lists your two new filters and that they are in the correct order. Remember that if you edit one, it may change the order on you. Edit the other one and activate again to bring it down to the bottom. #1 (Spam High) should be first in the list and #2 (Spam Mid) should be second.

 

You might be thinking: So what did I just do and what is going to happen to all of my email? Here’s a quick breakdown:

1) All emails determined to be “ham” (not spam) will be delivered to whichever mailbox they were originally intended for. Email sent to you@yourdomain.com will still arrive in your inbox. Email sent to otheryou@yourdomain.com will still arrive in that inbox.

2) Any emails that SpamAssassin flags as spam with a score below the number of + signs you designated in your second filter (mid-range spam) will still arrive in their originally intended mailbox as mentioned above, with the exception that the subject line will be modified to say that it is spam. Don’t panic if this happens to a legitimate email. We can train SpamAssassin later, and so long as it is a relatively low spam score there really is no harm done anyway. After all, the email still arrived in your inbox, right?

3) Any emails that SpamAssassin flags as spam with a score at or above the number of + signs you designated in your second filter and below the number of + signs you designated in your first filter will be routed to your spam mailbox. This puts it all in a handy spot that you can check periodically to make sure you didn’t miss out on an email that was incorrectly flagged as spam for some reason. Should you ever find a legitimate email here you can easily forward it to its original recipient (i.e. you) through webmail or however you access this mailbox and it should then arrive in your normal inbox. We will also use the email here to train SpamAssassin in the next part of this tutorial.

4) Any emails that SpamAssassin flags as spam with a score at or above the number of + signs you designated in your first filter will be routed to your spam2 mailbox. These will be high-scoring spam and as already mentioned, eventually the goal is to delete these without ever seeing them.

5) Note that SpamAssassin only examines messages below a certain size in order to prevent it from choking on large emails and slowing the server down (I don’t remember the exact size). What this means is that emails with large attachments or lots of images don’t get scanned by SpamAssassin at all. Because of this, some spam can slip right through with ease (this also applies to emails from people in your blacklist). Fortunately, very little spam mail is ever larger than the imposed size limit.

 

 

So what kind of immediate results can you expect from all of this? Well, I almost guarantee that you will continue to receive spam in your inbox when you check your email. Some should be properly flagged as spam, some may be flagged as spam but it is really a legitimate email, some spam will not be flagged as spam at all. In other words, you likely won’t see much of an immediate change. This is where the tweaking begins! In my examples above, I purposely suggested very conservative numbers for your required _score as well as the required spam levels for your two email filters. This is to prevent you from missing any important emails, but as a result, your multi-level spam handling will not be very effective until you tweak the numbers, so read on as we get into that. After careful testing of my PERSONAL spam situation, extensive bayes training and tweaking of scores assigned to specific SpamAssassin tests, my personal settings are: required_score 3.7, Mid-range spam score (minimum score to get placed into my spam box) is 5, and all emails with a score of 8 and higher get deleted. Do not use these settings for yourself! You really must take your time and do lots of bayes training before you can begin to tighten things down like this.

When email is flagged as spam, you can see what kind of score it got by examining the email headers. Using this information, you can then further tweak your required_score number as well as adjust the levels at which your spam gets sorted to your other mailboxes. You may choose to also set up your email client to check messages in your spam and spam2 accounts, but I prefer to do that through webmail. Initially you will probably find much more spam in your normal email account’s inbox than in the spam and spam2 accounts because the scores required for making it to the spam and spam2 account are pretty high.

 

 

You can also tweak scores for certain SpamAssassin tests, which will help increase the effectiveness of your multi-level filtering.

 

For example, a lot of spam I receive gets points added for a test called “RCVD_IN_BL_SPAMCOP_NET”. This particular SpamAssassin test looks at a blacklist to see if it from a known spammer. While I suppose anything is possible, a hit on this test almost guarantees that it is spam. cPanel allows you to tweak scores for tests and it is easy to do. I forget what the original score for this particular test is, but I have increased it in my SpamAssassin configuration file. Here’s how:

 

From your main cPanel screen, click on the SpamAssassin icon then choose ‘Configure SpamAssassin’. There should be some blank boxes next to labels called score. For this example, type (or copy & paste) the following into that box:

 

RCVD_IN_BL_SPAMCOP_NET 3.5

 

Save your changes, then go back to your configuration and you should see your new test score has been set. The next time an email comes in and gets a hit on that test, it will now get 3.5 points added to the score. Obviously this increases the likelihood that this spam mail will be bumped up into your mid-range spam box. There are several other tests that I have adjusted the scores on, including Bayes tests, but the points you assign to these tests should always be slowly tweaked over time. While the above sample just about ALWAYS indicates spam, the problem is that just because a test gets a hit on a piece of spam, it does not mean that it ALWAYS indicates spam. It is also a good idea (before you begin to get more aggressive with your spam filtering) to utilize the whitelist option in SpamAssassin to whitelist all of your friends and other important email addresses that you want to prevent from getting flagged as spam. Whitelisting an address automatically assigns a score of -100 to the email, thus eliminating the possibility of a false-positive. You can easily add email addresses to your whitelist and blacklist through the ‘Configure SpamAssasin’ page in cPanel.

 

 

As each day passes, you will get ham and spam coming in (just as you already have been). Your job now is to look at all of them and see what kind of scores your spam is getting and what kind of scores your ham (non-spam) is getting. Then SLOWLY adjust your required_score as well as the number of + signs that determine how to route your spam in your filters. It is important to resist the urge to use aggressive numbers right away as this will only lead to increased false-positives. My personal goal is all legitimate email and little to no spam making it to my normal inbox, less than 20-25 spams making it to my mid-range spam box (per WEEK) with no false-positives, and all the rest making it to my high spam box (which is actually just deleted now), and after a couple months of tweaking and BAYES TRAINING, I have reached that goal. I will cover bayes training in another installment. It is a bit more complicated and involved than what we’ve covered here but the rewards can be quite worth it.

 

 

Earlier I mentioned that the eventual goal for the high scoring spam was to get rid of it altogether and never see it. Also as mentioned, I am at that point now, but you must give it time and wait until you are CERTAIN that nothing legitimate ever makes it to that last high spam box. Once you are sure of that, you can make these final changes below. This will cause all of these high-scoring spam emails to get deleted immediately. Warning: You will never see them and there is no way to ever get them back.

1) From your main cPanel screen, go into Account Level Filtering and click ‘Edit’ next to your first (#1 Spam High) filter.

2) For the first action (Deliver to folder), change it to [Discard Message] and click Activate. Be sure to leave the [stop Processing rules] action in place.

3) Go back to the main filters page. You will notice that cPanel has now moved your “#1” filter below #2. As mentioned earlier, that’s not what we want and that will allow all those high spams into your mid-range spam box. To fix this problem, click on ‘Edit’ next to the “#2” filter to bring up the filter’s settings. Click Activate and go back to your filter page. They should now be in the correct order.

4) If you are sure you will no longer ever want to keep that high scoring spam again, go ahead and delete the Spam2 email account.

 

 

I should also mention that no matter how you do it (whether directly through your favorite email client or Webmail), you might want to periodically empty the mail out of your Spam and Spam2 mailboxes, especially if you are concerned about a mailbox size quota. If you are not doing any bayes training, there is no need to keep this extra spam at all once you are done checking for false-positives and determining the scores and if/how you want to tweak your settings. If you will do bayes training, you will want to hang on to them in order to feed them into SpamAssassin. On that note, there is a setting you can adjust in SpamAssassin to automatically learn spam over x score. This is what I do with the really high spam that my filter deletes. I never see it so I can’t use it to manually train SpamAssassin, but I don’t need to because it does it for me as soon as it comes in!

 

See part 2 for information on Bayes Training in SpamAssassin. This allows you to teach SpamAssassin what is legitimate email (“ham”) and what is spam. It takes a while before the bayes filter kicks in (SpamAssassin does not use bayes tests until it has learned at least 200 ham and 200 spam messages), but once it does, SpamAssassin’s accuracy goes up pretty quickly, and what’s more, you can assign higher scores to bayes tests, such as “BAYES_99”, which means that SpamAssassin is 99% sure that the email is spam based on bayes testing. Armed with that, you can assign it a higher score and get it out of your inbox (and possibly even have your #1 filter delete it automatically should you so choose).

Link to comment
Share on other sites

Taking control of your spam: Part 2 – Bayes Training in SpamAssassin

Written by Kevin Dommer

 

Even if you don’t plan to incorporate my multi-level spam handling in the previous post, you might want to read through it as a bit of an introduction to this part.

 

I won’t get into a lengthy discussion on what Bayes filtering does or how it works but I will provide a brief example (albeit it may not be 100% accurate). SpamAssassin works by comparing a database of “rules” to the email messages it scans. An example might be the word “Viagra”. If an email has the word Viagra in it, it might assign a small amount of points to the email based on a built-in rule. If the email scores enough points (from additional rules) to meet the minimum required score to be marked as spam, it does so. This is a VERY basic example, and the rules that SpamAssassin uses are actually much more complex than this (and there are lots and lots of built-in rules). But continuing to use this example, let’s say you never get legitimate emails containing the word Viagra. Bayes (or Bayesian) filtering is based on statistics. After training SpamAssassin on several hundred ham and spam samples, another email coming in with the word Viagra might also cause SpamAssassin to assign even more points based on a bayes test. Since SpamAssassin has never seen a ham message with that word, it might add points for a test called BAYES_99 (meaning that SpamAssassin is 99% certain that this email is spam from a statistical perspective). On the other hand, if you have a buddy that often sends you emails with the word Viagra in it, and you have taught SpamAssassin that those emails from your buddy are ham (which, technically you should because they are not spam), then SpamAssassin might think: Of all of the emails trained so far that contained this word, 20% of them were trained as ham and 80% were trained as spam, therefore my guess is that there is an 80% chance that this email is spam. But it doesn’t stop there. Bayes scanning looks at many different elements of an email and determines what is important and what is not. So that 99% chance may actually ultimately become an 80% chance of being spam if other elements of the email suggest that it may also be legitimate (ham). Again, this is a very crude example, but it should be enough to give you an idea of how bayes filtering works, and why bayes training can increase SpamAssassin’s accuracy considerably. It should also be obvious that it is also important to train it often and train it accurately.

 

There are several different approaches to take when it comes to training SpamAssassin. Unfortunately, cPanel doesn’t offer a direct and easy way to do this. Fortunately though, it can be done with a little custom configuration. Bear in mind though that training is pretty much an ongoing thing. You can probably stop doing so after a while, but as spammers get smarter and spam contents change, training once again becomes a necessity. And although a well-trained SpamAssassin installation can be very effective, you will always have a few that slip through anyway.

 

As mentioned in part 1, there may be some privacy issues when it comes to training. In order to properly train SpamAssassin, you must hand-sort all of the emails that come in to your domain. In my situation this is not a problem because although I have email addresses for myself, my wife and a small software business, we openly view each other’s emails all of the time. Personally, I have forwarders set up to forward copies of ALL email that comes into my domain into one central mailbox that I can use for spam training. This simplifies things for me but this leaves no privacy as I can see all emails that come in (whether they are for me or my wife). You can take this route, or you can do your training on an individual basis. In the latter case, it will be up to each email account user to sort his or her own email.

 

---------------------------------------------------

DECISION TIME:

If you decide to forward all of your many email accounts to one for the purpose of making your training easier, you can follow the steps below. Otherwise just skip to Preparing for Bayes Training.

To forward all emails to one separate mail account for training:

1) Create a new email account. I called mine backup, but you can call it whatever you want. This is the account you will use for sorting your ham & spam for use by the training script.

2) Click on the Forwarders shortcut in cPanel.

3) Click the Add Forwarder button

4) In the first field, enter an existing email address on your domain to forward

5) For the destination, enter your newly created address, such as backup@mydomain.com and save your changes.

6) Repeat the steps to forward each of your existing email accounts to your new “backup” account. It is not necessary to do this with your spam or spam2 addresses since spam will be automatically sorted to those by your filters and the messages there can be easily trained as there shouldn’t be much to sift through.

---------------------------------------------------

 

Preparing for Bayes Training

The goal: 1) We will manually edit our SpamAssassin configuration file to make some changes to the way SpamAssassin uses bayes. 2) We will create special folders in our mailboxes (through Webmail) that will be used to sort ham and spam. 3) We will install a small CGI script onto our site that will automatically teach SpamAssassin the emails you have sorted. 4) We will set up a cron job to run the CGI script every week.

 

1) Edit the SpamAssassin configuration file. This can be done directly through cPanel or by using your favorite FTP program. For the sake of simplicity, I will explain how to do it through cPanel.

a) In cPanel, click on the File Manager shortcut and select ‘Home Directory’. Make sure the option to show hidden files is selected then click Go

B) In the left pane, click on .spamassassin

c) Click on user_prefs to highlight it then click on the Edit button near the top of the browser window

d) Confirm that the encoding type is set to us-ascii then click on Edit

e) Create some blank spaces in the file by pressing [Enter] at the very beginning of a line (being careful not to mess up anything already in there and then copy and paste the following into the file:

 

use_bayes 1

bayes_auto_learn 1

bayes_auto_learn_threshold_nonspam -2.0

bayes_auto_learn_threshold_spam 15.0

 

f) NOTE: the nonspam threshold should be low enough so that there is no possibility that a SPAM with a low score can be auto-learned as ham. Some spam comes in at a 0, so you definitely want it below 0. I am recommending -2.0 to be conservative, but you can set it even lower if you want to ensure that it definitely can’t auto-learn spam as ham (such as -20.0). You will be manually training your ham anyway.

g) NOTE: the spam threshold should be high enough so that there is no possibility that a HAM that happened to get flagged with a high spam score can be auto-learned as spam. You may notice that the number I gave for an example happens to be the same as the level I recommended for your ‘high spam’ filter in part 1. This will be handy for when the time comes that you decide to outright delete those high scoring emails. If this number matches the level at which you delete high scoring spam, it will automatically be learned as spam before deleting it and you won’t miss out on training any emails. If you want to be extra cautious with this setting for now, set it to something higher like 30.0. You will be manually training all of your spam for now anyway.

h) use_bayes 1 simply tells SpamAssassin to use bayes testing. This will not do anything until you have learned 200 ham and 200 spam messages. You may leave it on, but if for some reason in the future you wish to disable it, just change the 1 to a 0.

i) Bayes_auto_learn 1 simply tells SpamAssassin to automatically learn emails as ham and spam as they are scanned, based on the thresholds that you have set. If you want to turn this off, change the 1 to a 0.

j) When finished making your changes, click on the Save Changes button in the upper-right hand corner of the window.

 

2) Create special learning folders in each email account that you will use for training. If you want to allow each email account holder to manage their own sorting (for privacy issues), then the following steps need to be done for each email account in your domain. At a minimum, you will also need to do this for your spam mailbox (mid-range spam). You can choose to have SpamAssassin automatically learn all of the high spam (which ends up in the spam2 mailbox) by matching the auto-learning threshold setting in your configuration file as mentioned above. But if you prefer to train those manually then the following steps need to be performed for that mailbox too. If you choose to forward all emails from all accounts to one other account and use that for all of your sorting and learning then you need to do the following for that account as well. If you do NOT intend to do sorting and training on individual accounts, you really don’t need to do the following for those but it only takes a few seconds and it is harmless to have these extra folders whether you use them or not (you may want to do it individually someday). IMPORTANT NOTE: Most people have their email clients on their PC set to download new messages then delete them from the server. If you are leaving the training up to each individual (or sorting email individually for each email account), then you will need to change your email client (such as Outlook Express) to leave the messages on the server so that they are still there for you to sort. Once sorted, you can then delete them from the server through webmail, but you must be sure you don’t accidentally delete an email in Webmail that hasn’t been downloaded into your normal email client yet. I know this may sound confusing and it can be a bit of a hassle. This is one reason why I chose to forward (copy) all incoming emails to a separate email account that I use for training as outlined above. Since these are copies and nobody ever accesses these messages outside of webmail, there is no issue and no changes to Outlook Express are necessary. But this option is also where the privacy issue comes into play. If you do not understand this or need clarification, please post in the forum before proceeding.

a) Going through each email account one at a time, log into your account through Webmail and go into Horde. I only use Horde so these instructions are based on Horde. While I am sure these things can be done in other webmail clients, I am not familiar with them.

B) Proceed to your inbox in Horde then click on the ‘Folders’ button at the top of the window.

c) Click where it says ‘Choose Action’ and select Create. Your browser may prompt you to allow a script to run before allowing this action.

d) A small window will come up where you can enter a new folder name. Type the following and click OK: learn_ham

e) Repeat the process above to create another new folder and name that one learn_spam

f) Once your new folders are created, you can log out and log back in as the next user and repeat the process of creating your new learning folders.

 

3) Next we need to install a CGI script that will automatically feed the sorted emails into SpamAssassin’s learning module (sa-learn). To give credit where credit is due, I use a customized script based on a script created by Ian Douglas (http://iandouglas.com/spamassassin-trainer/). From his website you can download a CGI script that is customized to your needs. Originally I used his script but I decided that I wanted mine to work a bit differently so I modified his to come up with something a bit more suited to my needs. The choice is yours on how you want to proceed with the actual training, but the rest of this guide will be referring to my custom version of his script. It should be noted that his original script works with both MailDir and Mbox mail formats. I don’t know enough about cPanel installations to know which host providers use which, but MY script ONLY works with MailDir, which is what my site uses (hosted here at MDD Hosting).

a) Download the salearning.cgi script here. (( Right-click to save to your computer! ))

B) You will need to edit it before you can use it as follows:

c) Scroll down to the MAIN SETUP area and edit the variables to match your information. You will need to enter your domain name (such as mydomain.com) and your cPanel username. There is also an option to automatically delete the messages after learning is done. I recommend “Y” since it cleans things up nicely when it is finished. If you choose “N” then you will need to manually go into your learning folders through webmail every time you finish training so that you can manually remove the already-learned emails.

d) Scroll down a bit further and you will see a boxed off section that begins with “Specify users to scan mail for…….”. There are already two set up: backup and spam. These are what I use (again, I use the “backup” option, where all of the incoming email gets forwarded to that single address) but if you are going to do your sorting and scanning on each mailbox, then you need to create new lines just like the ones shown, substituting the mailbox name in quotes with your own mailbox name. Repeat for each mailbox you are scanning. Be sure that you have previously created the learn_ham and learn_spam folders, otherwise the script may error out when executed. It doesn’t matter if the learning folders are empty or not, just so long as they exist.

e) Once you are done editing the file, save it then upload it to the .spamassassin folder found in the root level of your site. To do so, go into the File Manager through cPanel and browse to the .spamassassin folder (as previously mentioned when editing the user_prefs file), then click the button at the top of the window to upload a file. Browse your computer for the salearning.cgi file that you edited and upload it to the .spamassassin folder on your site. You may need to ensure that the permissions are set correctly on the CGI script. At a minimum, they should be set to 750. To check them, right-click on the CGI script in the File Manager and choose Change Permissions from the menu.

4) Create a cron job to automatically feed your ham and spam into sa-learn every week.

a) In cPanel, click on the Cron Jobs shortcut.

B) It’s a good idea to have the cron manager send you an email every time the script runs so go ahead and enter your email address here if it isn’t there already.

c) Under the Add New Con Job section, enter the following:

 

Minute: 0

Hour: 1

Day: *

Month: *

Weekday: 1

Command: ~/.spamassassin/salearning.cgi

 

d) Click the Add new Cron Job button and all should be well. The settings above will cause the cron job to run every Sunday night at midnight. This gives you all week to sort your ham and spam at your leisure. The nice thing about it is that if you don’t have time, the job will still run but there will be no harm done….. it simply won’t learn anything. If you’d like to test your cron job, you can edit it and temporarily set it up to run every minute (using the default option). Wait a minute and check your email to see that you got the results. If all is well, be sure to change it back to the settings above (or whatever your preference is).

 

 

So what’s next? All you need to do is sign in to your email account (via webmail) every few days or once a week (whenever you want) and go through the messages in your inbox. Select all of the ham (good email) then using the move feature in Horde, choose the learn_ham folder to move it to. The messages will disappear from the inbox and be moved to the learn_ham folder. Next, do the following for your spam: Select all of the spam messages and move them to the learn_spam folder. If there are some that you are not sure what to do with, just delete them without moving them.

 

It takes a while to build up a large database of ham and spam in SpamAssassin but once the bayes filtering kicks in, you should notice a drop in spam (or at least more accurate classification). After that, you can begin to tweak some of your scores a bit more to tighten things up a bit.

 

I hope this helps and please direct any questions or comments to this thread rather than bombarding MDD Hosting with questions. As mentioned in the beginning, this isn’t something they normally cover.

Link to comment
Share on other sites

  • 4 years later...
Thank you very much for sharing your knowledge here and providing such detailed instructions on configuring SpamAssassin.
At this time I would like to forgo any server-side SPAM management and handle all SPAM on the client side.
What do I need to do to disable all SPAM filtering on the server?
I often don't receive messages that are NOT spam. Email Trace shows that they reach scanner01.mail.supportedns.com but fail there.
Apache SpamAssassin is currently enabled in my cPanel account, but I don't see an option to disable it.
Here is one example of a message delivery failure from scanner01.mail.supportedns.com from the Email Trace in cPanel:
Event: failure error
User: -remote-
Domain: 
Sender: noreply@EXAMPLE.com
Sent Time: Oct 13, 2015 7:16:07 AM
Sender Host: scanner01.mail.supportedns.com
Sender IP Address: 162.244.253.254
Authentication: localdelivery
Spam Score: 0
Recipient: test@MyDomain.com
Delivery User: myAccount
Delivery Domain: MyDomain.com
Delivered To: 
Router: smarthost_regular
Transport: remote_smtp_smart_regular
Out Time: Oct 13, 2015 7:17:07 AM
ID: 1Zlhed-000pjH-8l
Delivery Host: scanners.mail.supportedns.com
Delivery IP Address: 162.244.253.253
Size: 3.89 KB
Result: DHE-RSA-AES256-SHA:256: SMTP error from remote mail server after MAIL FROM:<noreply@EXAMPLE.com> SIZE=5077: 550-Verification failed for <noreply@EXAMPLE.com>\n550-Called: 12.188.100.82\n550-Sent: RCPT TO:<noreply@EXAM.....RESULTS CUT OFF HERE

Here are some more details:

Server: Boreas
SpamAssassin: Enabled (Don't know how to disable)
SpamAssassin - Spam Auto-Delete: Disabled
SpamAssassin - Spam Box: Disabled
Thank you in advance for any suggestions.
Link to comment
Share on other sites

 

Thank you very much for sharing your knowledge here and providing such detailed instructions on configuring SpamAssassin.
At this time I would like to forgo any server-side SPAM management and handle all SPAM on the client side.
What do I need to do to disable all SPAM filtering on the server?

 

I have a hosting account with someone else, and just under the words that say, "Apache SpamAssassin is currently enabled," there is a button that says Enable/Disable. My guess is that MDD turned off the ability for us to disable this for ourselves. Maybe they're making sure they catch all virus-like attachments or something. If you are getting email that is deleted, you might be able to override the server values with your own user_prefs file in your .spamassassin folder. You'll have to do some research though, as I'm not sure what the server's file is. You'd probably have to find the identical value and then just set it lower.

 

It does look like some mail is just automatically rejected outright before it even gets to my rules though, but all these say the sender is in an RBL, so I'm fine with that.

 

In regard to the OP, thanks for writing that up. I kind of took a different route, and started with a good user_prefs list online, then adjusted the values based on the SPAM that was coming in most often. This reduced the number of spam coming in to my wife's account from about 400-500 a day to maybe two or three. I monitored the SPAM box for a month or so, to make sure there were no legitimate emails getting snagged there. At some point, I'm going to just set the auto-delete.

 

But this article will be helpful for setting up the auto-learning, which I didn't do. I would like to set some of the triggers at lower values (the reverse DNS was one I set particularly high, to 6.8, because it seems every legitimate piece of mail she gets passes that test fine), but I'm a little leery of that one. So I think I'll set up the learning, and then I can set the reverse DNS to a lower value. So that's for writing that up!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...