Bayesian Classifier


IMPORTANT: You have to train with at least 1 ham and 1 spam message before the classifier will operate.

VERY IMPORTANT: Enabling the classifier with a dozen ham messages and hundreds of spam messages,
or vice versa, will most surely result in mis-classification of messages.

How does this work?

The Bayesian classifier calculates the conditional probability that a message is spam or it is not spam based on a database of words found in spam and non spam messages. The database is specific to this mailbox. The database is built from messages that you submit for training.

The probability that a message is spam or not spam, commonly called ham, is a score from 0 to 1 inclusive. The classifier operates in a tri-state mode - there are three states of classifying a message: 1) ham, 2) spam, or 3) unsure. Scoring is non-linear, with spam messages scoring greater than 0.98, and ham messages less than 0.45. Messages scoring between 0.45 and 0.98 are classified as unsure.

The classifier can be used in conjunction with SpamAssassin scoring or as a stand-alone spam classifier. Using the classifier with SpamAssassin scoring is the recommended configuration.

When the classifier is enabled in conjunction with SpamAssassin, a configurable value for spam and ham is added to the SpamAssassin score. The default value is +10 for messages classified as spam, -10 for messages classified as ham, and 0 for messages classified as unsure. The SpamAssassin scoring thresholds for the address the message was sent to are used with the new score to determine the final delivery folder. SpamAssassin subject tags are removed and a new subject tag is added if that option is configured and enabled.

When the classifier is enabled and SpamAssassin scoring is disabled, subject tags added by SpamAssassin are removed, a new subject tag is added if enabled, and a X-Junkmail header is added if enabled.


Submitting messages for training

Start by creating these two top level IMAP folders if they do not already exist:
Auto-Train/Spam     Spam messages
Auto-Train/Ham     Non spam messages
Messages found in these two folders are used to train the bayes classifier as to what you consider spam and ham.

The default configuration is to move trained messages into the Auto-Train/Trained/Ham folder or into the Auto-Train/Trained/Spam folder as appropriate. Those folders will be created if they do not exist.

If notices are enabled, a report will be delivered to your mailbox. Here is a sample report after submitting 4 spam messages for training.
Your bayes database for this mailbox has been trained with:
124 Ham messages
847 Spam messages
Classifier Status: Enabled  http://manage.mxes.net/index.php?help=bayes

0 messages were added as Ham
2 messages were added as Spam and moved to the Auto-Train/Trained/Spam folder
2 Spam messages were already properly classified and moved to the Auto-Train/Trained/Spam folder

Messages trained as Spam

Sender                              Subject
----------------------------------------------------------------------------
<kdjbr09wnwcr@infoconex.com>        LOW SALARY IS TOUGH FOR EVERYTHING says 
<10A17BSG@concierge3.boomerang.com> Moving? Move online for a $30 Gift Card.
We think the best method for training is "training on error". This means adding messages to the database (training) that are not classified correctly. This method works very well in conjunction with SpamAssassin scoring with as few as 20 or 30 messages in the database but training with several hundred messages is recommended. Testing shows that the classifier works best with more spam messages than ham messages in the database.




Clearing the database

If for whatever reason you want to empty the database and start over, click on 'Clear Bayes Database' and a request will be queued to create an empty database. You will be prompted for confirmation. The Ham and Spam counts shown will be cleared but it may take a few minutes to actually create a new empty database.


Looking inside the classifier

If you want to know why a message classified a certain way, copy or move the message into the Auto-Train/Report folder. A report will be delivered to your INBOX in a minute or two.

The default configuration is to move messages into the Auto-Train/Trained/Report folder after the report is generated. That folder will be created if it does not exist.

Sample report:
Y 1.000000
   int  cnt   prob  spamicity histogram
  0.00    0 0.000000 0.000000 
  0.10    0 0.000000 0.000000 
  0.20    0 0.000000 0.000000 
  0.30    0 0.000000 0.000000 
  0.40    0 0.000000 0.000000 
  0.50    0 0.000000 0.000000 
  0.60    0 0.000000 0.000000 
  0.70    0 0.000000 0.000000 
  0.80    2 0.894564 0.948116 ##
  0.90   19 0.969343 0.962241 ###################

                                      n    pgood     pbad      fw     U
"always"                              8  0.062500  0.028037  0.310144 -
"head:Message-Id"                   102  0.737500  0.401869  0.352741 -
"make"                               17  0.112500  0.074766  0.399378 -
"like"                               22  0.137500  0.102804  0.427882 -
"with"                              106  0.575000  0.560748  0.493730 -
"head:Date"                         187  1.000000  1.000000  0.500002 -
"Don?t"                               0  0.000000  0.000000  0.520000 -
"Let?s"                               0  0.000000  0.000000  0.520000 -
"afterthought.de.revetments.net"       0  0.000000  0.000000  0.520000 -
"beatify.tv.revetments.net"           0  0.000000  0.000000  0.520000 -
"ejaculation"                         0  0.000000  0.000000  0.520000 -
"from:Miles"                          0  0.000000  0.000000  0.520000 -
"from:gamebox.net"                    0  0.000000  0.000000  0.520000 -
"from:gilduff1978"                    0  0.000000  0.000000  0.520000 -
"head:85.68.133.116"                  0  0.000000  0.000000  0.520000 -
"magic"                               0  0.000000  0.000000  0.520000 -
"rcvd:38.113.3.78"                    0  0.000000  0.000000  0.520000 -
"rcvd:85.68.133.116"                  0  0.000000  0.000000  0.520000 -
"rcvd:gamebox.net"                    0  0.000000  0.000000  0.520000 -
"rcvd:kenjm"                          0  0.000000  0.000000  0.520000 -
"rcvd:mx2.gamebox.net"                0  0.000000  0.000000  0.520000 -
"rtrn:gamebox.net"                    0  0.000000  0.000000  0.520000 -
"rtrn:gilduff1978"                    0  0.000000  0.000000  0.520000 -
"seen!"                               0  0.000000  0.000000  0.520000 -
"steel!"                              0  0.000000  0.000000  0.520000 -
"subj:Urgent"                         0  0.000000  0.000000  0.520000 -
"subj:manager"                        0  0.000000  0.000000  0.520000 -
"subj:message"                        0  0.000000  0.000000  0.520000 -
"tab!"                                0  0.000000  0.000000  0.520000 -
"to:kenjm"                            0  0.000000  0.000000  0.520000 -
"thing"                               5  0.025000  0.028037  0.528604 -
"head:charset"                       39  0.187500  0.224299  0.544670 -
"head:plain"                         45  0.212500  0.261682  0.551847 -
"Now"                                 8  0.037500  0.046729  0.554708 -
"head:MIME-Version"                 111  0.500000  0.663551  0.570273 -
"head:Content-Transfer-Encoding"      43  0.187500  0.261682  0.582549 -
"http"                              116  0.500000  0.710280  0.586862 -
"trust"                               3  0.012500  0.018692  0.598783 -
"best"                               12  0.050000  0.074766  0.599134 -
"possible"                           12  0.050000  0.074766  0.599134 -
"head:Content-Type"                 127  0.525000  0.794393  0.602078 -
"you"                               107  0.437500  0.672897  0.605983 -
"head:text"                          56  0.225000  0.355140  0.612133 -
"the"                               111  0.425000  0.719626  0.628682 -
"rcvd:Jun"                          140  0.500000  0.934579  0.651449 -
"rcvd:from"                         140  0.500000  0.934579  0.651449 -
"rcvd:EDT"                          137  0.462500  0.934579  0.668933 -
"rcvd:for"                          124  0.412500  0.850467  0.673366 -
"rcvd:Postfix"                      136  0.450000  0.934579  0.674971 -
"rcvd:with"                         126  0.412500  0.869159  0.678129 -
"our"                                58  0.187500  0.401869  0.681814 -
"hard"                                4  0.012500  0.028037  0.690882 -
"head:Build"                          4  0.012500  0.028037  0.690882 -
"rcvd:Tue"                          112  0.350000  0.785047  0.691615 -
"You"                                47  0.137500  0.336449  0.709812 -
"rcvd:ESMTP"                        112  0.312500  0.813084  0.722334 -
"head:X-Mailer"                      44  0.112500  0.327103  0.743997 -
"head:MimeOLE"                       12  0.025000  0.093458  0.788556 -
"head:Produced"                      12  0.025000  0.093458  0.788556 -
"head:X-MimeOLE"                     12  0.025000  0.093458  0.788556 -
"head:Microsoft"                     24  0.050000  0.186916  0.788755 -
"head:Outlook"                       21  0.037500  0.168224  0.817465 -
"head:quoted-printable"               8  0.012500  0.065421  0.838871 -
"rcvd:esmtp"                         72  0.112500  0.588785  0.839501 -
"rcvd:helo"                          70  0.087500  0.588785  0.870528 -
"rcvd:flinet.com"                    41  0.050000  0.345794  0.873518 -
"ever"                               12  0.012500  0.102804  0.891040 +
"to:tuffmail.com"                    64  0.062500  0.551402  0.898087 +
"rcvd:example.com"                   67  0.062500  0.579439  0.902537 +
"rcvd:someone"                       67  0.062500  0.579439  0.902537 +
"head:cbl.abuseat.org"               38  0.025000  0.336449  0.930642 +
"head:listed"                        38  0.025000  0.336449  0.930642 +
"rcvd:129.250.36.44"                 21  0.012500  0.186916  0.936964 +
"rcvd:129.250.36.54"                 21  0.012500  0.186916  0.936964 +
"head:X-Verio_Spamtag"               51  0.025000  0.457944  0.948085 +
"Delete"                              1  0.000000  0.009346  0.991605 +
"dreamt"                              1  0.000000  0.009346  0.991605 +
"gall2"                               1  0.000000  0.009346  0.991605 +
"head:Windows-1252"                   1  0.000000  0.009346  0.991605 +
"it?s"                                1  0.000000  0.009346  0.991605 +
"rock"                                1  0.000000  0.009346  0.991605 +
"erections"                           2  0.000000  0.018692  0.995766 +
"head:Office"                         2  0.000000  0.018692  0.995766 +
"head:Thread-Index"                   2  0.000000  0.018692  0.995766 +
"head:V6.00.2800.1106"                3  0.000000  0.028037  0.997169 +
"spur"                                3  0.000000  0.028037  0.997169 +
"rm.php"                              4  0.000000  0.037383  0.997873 +
N_P_Q_S_s_x_md                       21  0.000000  1.000000  1.000000
                                         0.017800  0.520000  0.375000

You can read about what this report means here
 
 
Close Window