QSF(1)				 User Manuals			       QSF(1)

NAME
       qsf - quick spam filter

SYNOPSIS
       Filtering:	qsf [-snrAtav] [-d DB] [-g DB]
			    [-L LVL] [-S SUBJ] [-Q NUM] [-X NUM]
       Training:	qsf -T SPAM NONSPAM [MAXROUNDS] [-d DB]
       Retraining:	qsf -[m|M] [-d DB] [-w WEIGHT] [-aN]
       Database:	qsf -[p|D|R|O] [-d DB]
       Database merge:	qsf -E OTHERDB [-d DB]
       Allowlist query: qsf -e EMAIL [-m|-M|-t] [-d DB] [-g DB]
       Help:		qsf -[h|l|V]

DESCRIPTION
       qsf  reads a single email on standard input, and by default outputs it
       on standard output.  If the email is determined to be spam,  an	addi-
       tional  header  ("X-Spam: YES") will be added, and optionally the sub-
       ject line can have "[SPAM]" prepended to it.

       qsf is intended to be used in a procmail(1) recipe, in a ruleset	 such
       as this:

	       :0 wf
	       | qsf -ra

	       :0 H:
	       * X-Spam: YES
	       $HOME/mail/spam

       For more examples, including sample procmail(1) recipes, see the EXAM-
       PLES section below.

TRAINING
       Before qsf can be used properly, it needs to be trained.	 A  good  way
       to train qsf is to collect a copy of all your email into two folders -
       one for spam, and one for non-spam.  Once you have done this, you  can
       use the training function, like this:

	       qsf -aT spam-folder non-spam-folder

       This will generate a database that can be used by qsf to guess whether
       email received in the future is spam or not.  Note that	this  initial
       training	 run  may take a long time, but you should only need to do it
       once.

       To mark a single message as spam, pipe it to qsf with the  --mark-spam
       or  -m ("mark as spam") option.	This will update the database accord-
       ingly and discard the email.

       To mark a single message as non-spam, pipe it to qsf with the  --mark-
       nonspam	or  -M ("mark as non-spam") option.  Again, this will discard
       the email.

       If a message has been mis-tagged, simply send it to qsf as  the	oppo-
       site type, i.e. if it has been mistakenly tagged as spam, pipe it into
       qsf --mark-nonspam --weight=2 to add it to the non-spam	side  of  the
       database with double the usual weighting.

OPTIONS
       The qsf options are listed below.

       -d, --database [TYPE:]FILE
	      Use  FILE as the spam/non-spam database.	The default is to use
	      /var/lib/qsfdb and, if that is not available or  is  read-only,
	      $HOME/.qsfdb.   This  option  can	 also be useful if there is a
	      system-wide database but you do not want to use it - specifying
	      your own here will override the default.

	      If   you	 prefix	 the  filename	with  a	 TYPE,	of  the	 form
	      btree:$HOME/.qsfdb,  then	 this  will  specify  what  kind   of
	      database FILE is, such as btree, gdbm, sqlite and so on.	Check
	      the output of qsf -V to see which database backends are  avail-
	      able.   The default is to auto-detect the type, or, if the file
	      does not already exist, use btree.  Note that TYPE is not case-
	      sensitive.

       -g, --global [TYPE:]FILE
	      Use   FILE   as	the   default  global  database,  instead  of
	      /var/lib/qsfdb.  If you also specify a database with  -d,	 then
	      this  "global"  database will be used in read-only mode in con-
	      junction with the read-write database specified with  -d.	  The
	      -g  option  can  be  used	 a  second  time  to  specify a third
	      database, which will also be used in  read-only  mode.   Again,
	      the filename can optionally be prefixed with a TYPE which spec-
	      ifies the database type.

       -s, --subject
	      Rewrite the Subject line of any email  that  turns  out  to  be
	      spam, adding "[SPAM]" to the start of the line.

       -S, --subject-marker SUBJECT
	      Instead  of adding "[SPAM]", add SUBJECT to the Subject line of
	      any email that turns out to be spam.  Implies -s.

       -n, --no-header
	      Do not add an X-Spam header to messages.

       -r, --add-rating
	      Insert an additional header X-Spam-Rating which is a rating  of
	      the  "spamminess"	 of a message from 0 to 100; 90 and above are
	      counted as spam, anything under 90 is not considered spam.   If
	      combined	with  -t,  then the rating (0-100) will be output, on
	      its own, on standard output.

       -A, --asterisk
	      Insert an additional header  X-Spam-Level	 which	will  contain
	      between 0 and 20 asterisks (*), depending on the spam rating.

       -t, --test
	      Instead  of  passing the message out on standard output, output
	      nothing, and exit 0 if the message is not spam, or  exit	1  if
	      the message is spam.  If combined with -r, then the spam rating
	      will be output on standard output.

       -a, --allowlist
	      Enable the allow-list.  This causes the email address given  in
	      the  message's  "From:" header to be checked against a list; if
	      it matches, then the message is  always  treated	as  non-spam,
	      regardless  of  what  the	 token database says.  When specified
	      with a retraining flag, -a -m (mark as spam) will	 remove	 that
	      address  from  the allow-list as well as marking the message as
	      spam, and -a -M (mark as non-spam) will add that address to the
	      allow-list  as  well  as	marking the message as non-spam.  The
	      idea is that you add all of your friends to the allow-list, and
	      then none of their messages ever get marked as spam.

       -L, --level, --threshold LEVEL
	      Change  the  spam scoring threshold level which must be reached
	      before an email is classified as spam.  The default is 90.

       -Q, --min-tokens NUM
	      Only give a score if more than NUM tokens are found in the mes-
	      sage  - otherwise the message is assumed to be non-spam, and it
	      is not modified in any way.  The default	is  0.	 This  option
	      might  be useful if you find that very short messages are being
	      frequently miscategorised.

       -e, --email, --email-only EMAIL
	      Query or update the allow-list  entry  for  the  email  address
	      EMAIL.  With no other options, this will simply output "YES" if
	      EMAIL is in the allow-list, or "NO" if it is not. With  -t,  it
	      will not output anything, but will exit 0 (success) if EMAIL is
	      in the allow-list, or 1 (failure) if it is  not.	With  the  -m
	      (mark-spam)  option,  any	 previous  allow-list entry for EMAIL
	      will be removed. Finally, with the  -M  (mark-nonspam)  option,
	      EMAIL  will  be added to the allow-list if it is not already on
	      it.

	      If EMAIL is just the word MSG on its own, then an email will be
	      read  from  standard  input, and the email address given in the
	      "From:" header will be used.

	      Using -e automatically switches on -a.

       -v, --verbose
	      Add extra X-QSF-Info headers to any filtered email,  containing
	      error  messages  and so on if applicable.	 Specify -v more than
	      once to increase verbosity.

       -T, --train SPAM NONSPAM [MAXROUNDS]
	      Train the database using the two mbox folders SPAM and NONSPAM,
	      by  testing  each	 message  in  each  folder  and	 updating the
	      database each time a message is miscategorised.  This  is	 done
	      several  times,  and  may	 take a while to run.  Specify the -a
	      (allow-list) flag to add every sender in the NONSPAM folder  to
	      your  allow-list	as a side-effect of the training process.  If
	      MAXROUNDS is specified, training will end after this number  of
	      rounds if the results are still not good enough. The default is
	      a maximum of 200 rounds.

       -m, --mark-spam
	      Instead of passing the message out on standard output, mark its
	      contents	as  spam and update the database accordingly.  If the
	      allow-list (-a) is enabled, the message's	 "From:"  address  is
	      removed from the allow-list.

       -M, --mark-nonspam
	      Instead of passing the message out on standard output, mark its
	      contents as non-spam and update the database  accordingly.   If
	      the  allow-list  (-a) is enabled, the message's "From:" address
	      is added to the allow-list (see the -a option above).

       -w, --weight WEIGHT
	      When marking as spam or non-spam, update the  database  with  a
	      weighting	 of  WEIGHT  per  token	 instead of the default of 1.
	      Useful when correcting mistakes, eg a  message  that  has	 been
	      mistakenly  detected as spam should be marked as non-spam using
	      a weighting of 2, i.e. double the usual weighting, to  counter-
	      act the error.

       -N, --no-autoprune
	      When  marking as spam or nonspam, never automatically prune the
	      database.	 Usually the  database	is  pruned  after  every  500
	      marks;  if you would rather --prune manually, use -N to disable
	      automatic pruning.

       -p, --prune
	      Remove redundant entries from the database and clean  it	up  a
	      little.	This  is  automatically	 done  after several calls to
	      --mark-spam or --mark-nonspam, and during training with --train
	      if  the  training	 takes a large number of rounds, so it should
	      rarely be necessary to use  --prune  manually  unless  you  are
	      using -N / --no-autoprune.

       -X, --prune-max NUM
	      When  the	 database  is  being pruned, no more than NUM entries
	      will be considered for removal.  This is	to  prevent  CPU  and
	      memory  resources being taken over.  The default is 100,000 but
	      in some circumstances (if you find that pruning takes too long)
	      this  option may be used to reduce it to a more manageable num-
	      ber.

       -D, --dump [FILE]
	      Dump the contents of the	database  as  a	 platform-independent
	      text  file, suitable for archival, transfer to another machine,
	      and so on.  The data is output on	 stdout	 or  into  the	given
	      FILE.

       -R, --restore [FILE]
	      Rebuild  the database from scratch from the text file on stdin.
	      If a FILE is given, data is read from  there  instead  of	 from
	      stdin.

       -O, --tokens
	      Instead  of filtering, output a list of the tokens found in the
	      message read from standard input,	 along	with  the  number  of
	      times each token was found.  This is only useful if you want to
	      use qsf as a general tokeniser for use with  another  filtering
	      package.

       -E, --merge OTHERDB
	      Merge the OTHERDB database into the current database.  This can
	      be useful if you want to take one user's mailbox and  merge  it
	      into  the system-wide one, for instance (this would be done by,
	      as root, doing qsf -d /var/lib/qsfdb -E  /home/user/.qsfdb  and
	      then removing /home/user/.qsfdb).

       -B, --benchmark SPAM NONSPAM [MAXROUNDS]
	      Benchmark	 the training process using the two mbox folders SPAM
	      and NONSPAM.  A temporary database is created and trained using
	      the  first  75%  of  the	messages in each folder, and then the
	      entire contents of each folder is tested to see how many	false
	      positives and false negatives occur. Some timing information is
	      also displayed.

	      This can be used to decide which backend is best on  your	 sys-
	      tem.   Use  -d  to  select a backend, eg qsf -B spam nonspam -d
	      GDBM - this will create a temporary database which  is  removed
	      afterwards.

	      The  exception  to  this	is  the	 MySQL	backend, where a full
	      database	   specification     must      be      given	  (-d
	      MySQL:database=db;host=localhost;...)   and  the database table
	      given will not be wiped beforehand or dropped afterwards.

	      As with -T, if MAXROUNDS is specified, training will  never  be
	      done for more than this number of rounds; the default is 200.

       -h, --help
	      Print a usage message on standard output and exit successfully.

       -l, --license
	      Print details of the program's license on standard  output  and
	      exit successfully.

       -V, --version
	      Print  version  information,  including  a  list	of  available
	      database backends, on standard output and exit successfully.

FILES
       /var/lib/qsfdb
	      The default  (system-wide)  spam	database.   If	you  wish  to
	      install  qsf system-wide, this should be read-only to everyone;
	      there should be one user with write access who can  update  the
	      spam database with qsf --mark-spam and qsf --mark-non-spam when
	      necessary.

       /var/lib/qsfdb2
	      A second, read-only, system-wide database. This can  be  useful
	      when  installing	qsf  system-wide  and  using third-party spam
	      databases; the first global database can be updated  with	 sys-
	      tem-specific  changes, and this second database can be periodi-
	      cally updated when the third-party spam database is updated.

       $HOME/.qsfdb
	      The default spam database for  per-user  data.   Users  without
	      write  access  to the system-wide database will have their data
	      written here, and the two databases will be read together.  The
	      per-user	database  will	be given a weighting equivalent to 10
	      times the weighting of the global database.

NOTES
       Currently, you cannot use qsf to check for spam while the database  is
       being  updated.	 This  means that while an update is in progress, all
       email is passed through as non-spam.

       There is an upper size limit of	512Kb  on  incoming  email;  anything
       larger than this is just passed through as non-spam, to avoid tying up
       machine resources.

EXAMPLES
       To filter all of your mail through qsf, with  the  allow-list  enabled
       and the "spam rating" header being added, add this to your .procmailrc
       file:

	       :0 wf
	       | qsf -ra

       If you want qsf to add "[SPAM]" to the subject line of any messages it
       thinks are spam, do this instead:

	       :0 wf
	       | qsf -sra

       To automatically mark any email sent to spambox@yourdomain.com as spam
       (this is the "naive" version):

	       :0 H
	       * ^To:.*spambox@yourdomain.com
	       | qsf -am

       To do the same, but cleverly, so that only  email  to  spambox@yourdo-
       main.com	 which	qsf  does NOT already classify as spam gets marked as
       spam in the database (this stops	 the  database	getting	 too  heavily
       weighted):

	       # If sent to spambox@yourdomain.com:
	       :0
	       * ^To:.*spambox@yourdomain.com
	       {
		  :0 wf
		  | qsf -a

		  # The above two lines can be skipped if you've
		  # already piped the message through qsf.

		  # If the qsf database says it's not spam,
		  # mark it as spam!
		  :0 H
		  * ^X-Spam: NO
		  | qsf -am
	       }

       Remove  the  -a	option in the above examples if you don't want to use
       the allow-list.

       A more complicated filtering example - this will only run qsf on	 mes-
       sages  which  don't have a subject line saying "your <something> is on
       fire" and which don't have a sender address ending  in  "@foobar.com",
       meaning	that  messages	with that subject line OR that sender address
       will NEVER be marked as spam, no matter what:

	       :0 wf
	       * ! ^Subject: Your .* is on fire
	       * ! ^From: .*@foobar.com
	       | qsf -ra

       For more on procmail(1)	recipes,  see  the  procmailrc(5)  and	proc-
       mailex(5) manual pages.

       A  couple of macros to add to your .muttrc file, if you use mutt(1) as
       a mail user agent:

	       # Press F5 to mark a message as spam and delete it
	       macro index <f5> "<pipe-message>qsf -am\n<delete-message>"
	       macro pager <f5> "<pipe-message>qsf -am\n<delete-message>"

	       # Press F9 to mark a message as non-spam
	       macro index <f9> "<pipe-message>qsf -aM\n"
	       macro pager <f9> "<pipe-message>qsf -aM\n"

       Again, remove the -a option in the above examples if you don't want to
       use the allow-list.

       Note, however, that the above macros won't work when operating on mul-
       tiple tagged messages. For that, you'd need something like this:

	       macro  index  <f5>  ":set   pipe_split\n<tag-prefix><pipe-mes-
	      sage>qsf		    -am\n<tag-prefix><delete-message>\n:unset
	      pipe_split\n"

       If you use qmail(7), then to get procmail working  with	it  you	 will
       need  to	 put  a line containing just DEFAULT=./Maildir/ at the top of
       your ~/.procmailrc file, so that procmail  delivers  to	your  Maildir
       folder  instead of trying to deliver to /var/spool/mail/$USER, and you
       will need to put this in your ~/.qmail file:

	       | preline procmail

       This will cause all your mail to be delivered via procmail instead  of
       being delivered directly into your mail directory.

       See  the	 qmail(7)  documentation  for  more  about mail delivery with
       qmail.

       If you use postfix(1), you can set up a	system-wide  mail  filter  by
       creating	 a user account for the purpose of filtering mail, populating
       that account's .qsfdb, and then creating a shell	 script,  to  run  as
       that user, which runs qsf on stdin and passes stdout to sendmail(8).

       Doing  this  requires some knowledge of postfix configuration and care
       needs to be taken to avoid mail loops.  One qsf user's full  HOWTO  is
       included in the doc/ directory with this package.

THE ALLOW-LIST
       A feature called the "allow-list" can be switched on by specifying the
       --allowlist or -a option.  This causes messages' "From:" addresses  to
       be  checked  against  a list of people you have said to allow all mes-
       sages from, and if a message's "From:" address is in the list,  it  is
       never  marked  as spam.	This means you can add all your friends to an
       "allow-list" and qsf will then never mis-file their messages - a quick
       way to do this is to use -a with -T (train); everyone in your non-spam
       folder who has sent you an email will be added to the allow-list auto-
       matically during training.

       You  can	 manually add and remove addresses to and from the allow-list
       using the -e (email) option. For instance, to add foo@bar.com  to  the
       allow-list, do this:

	       qsf -e foo@bar.com -M

       To remove bad@nasty.com from the allow-list, do this:

	       qsf -e bad@nasty.com -m

       And  to see whether someone@somewhere.com is in the allow-list or not,
       just do this:

	       qsf -e someone@somewhere.com

       In general, you probably always want  to	 enable	 the  allow-list,  so
       always  specify the -a option when using qsf.  This will automatically
       maintain the allow-list based on what you classify  as  spam  or	 non-
       spam.

       The  only  times you might want to turn it off are when people on your
       allow-list are prone to getting viruses or if a virus is causing email
       to be sent to you that is pretending to be from someone on your allow-
       list.

BACKUP AND RESTORE
       Because the database format is platform-specific, it is a good idea to
       periodically dump the database to a text file using qsf -D so that, if
       necessary, it can be transferred to another machine and restored	 with
       qsf -R later on.

       Also  note  that since the actual contents of email messages are never
       stored in the database (see TECHNICAL DETAILS), you can	safely	share
       your  qsf database with friends - simply dump your database to a file,
       like this:

	       qsf -D > your-database-dump.txt

       Once you have sent your-database-dump.txt to another person, they  can
       do this:

	       qsf -R < your-database-dump.txt

       They will then have an identical database to yours.

TECHNICAL DETAILS
       When a message is passed to qsf, any attachments are decoded, all HTML
       elements are removed, and the message text  is  then  broken  up	 into
       "tokens",  where	 a  "token"  is	 a single word or URL.	Each token is
       hashed using the MD5 algorithm (see below for why), and that  hash  is
       then used to look up each token in the qsf database.

       For  full  details  of which parts of an email (headers, body, attach-
       ments, etc) are used to calculate the spam rating, see  the  TOKENISA-
       TION section below.

       Within  the  database,  each token has two numbers associated with it:
       the number of times that token has been seen in spam, and  the  number
       of  times it has been seen in non-spam.	These two numbers, along with
       the total number of spam and non-spam messages seen, are then used  to
       give  a	"spamminess"  value for that particular token.	This "spammi-
       ness" value ranges from "definitely not spammy"	at  one	 end  of  the
       scale,  through	"neutral" in the middle, up to "definitely spammy" at
       the other end.

       Once a "spamminess" value has been calculated for all of the tokens in
       the message, a summary calculation is made to give an overall "is this
       spam?"  probability rating for the message.  If the overall  probabil-
       ity is 0.9 or above, the message is flagged as spam.

       In  addition  to the probability test is the "allow-list".  If enabled
       (with the -a option), the whole probability check is  skipped  if  the
       sender  of the message is listed in the allow-list, and the message is
       not marked as spam.

       When training the database, a message  is  split	 up  into  tokens  as
       described  above,  and then the numbers in the database for each token
       are simply added to: if you tell qsf that a message is spam,  it	 adds
       one  to the "number of times seen in spam" counter for each token, and
       if you tell it a message is not spam, it adds one to  the  "number  of
       times  seen  in	non-spam"  counter  for each token.  If you specify a
       weight, with -w, then the number you specify is added instead of	 one.

       To  stop	 the  database	growing	 uncontrollably,  it is automatically
       pruned after every 500th update (or, in training mode, after every  15
       rounds).	  The pruning step removes any tokens with very small message
       counts, and scales all counts down if they start getting too large.

       Finally, the reason MD5 hashes were used is privacy.   If  the  actual
       tokens from the messages, and the actual email addresses in the allow-
       list, were stored, you could not share a single qsf  database  between
       multiple	 users	because	 bits  of everyone's messages would be in the
       database - things like emailed passwords, keywords  relating  to	 per-
       sonal  gossip,  and  so on.  So a hash is stored instead.  A hash is a
       "one-way" function; it is easy to turn a token into a  hash  but	 very
       hard  (some  might  say impossible) to turn a hash back into the token
       that created it.	 This means that you end up with a database  with  no
       personal information in it.

TOKENISATION
       When  a message is broken up into tokens, various parts of the message
       are treated in different ways.

       First, all header fields are discarded, except for the important ones:
       From, Sender, To, Reply-To, and Subject.

       Next, any MIME-encoded attachments are decoded.	Any attachments whose
       MIME type starts with "text/" (i.e.  HTML  and  text)  are  tokenised,
       after  having any HTML tags stripped.  Any non-textual attachments are
       replaced with their MD5 hash (such that two identical attachments will
       have the same hash), and that hash is then used as a token.

       In addition to single-word tokens from textual message parts, qsf adds
       doubled-up tokens so that word pairs get added to the database.	 This
       makes  the database a bit bigger (although the automatic pruning tends
       to take care of that) but makes matching more exact.

SPECIAL FILTERS
       As well as using the textual content of email to detect spam, qsf also
       uses  special  filters  which  create "pseudo-tokens" based on various
       rules.  This means that specific patterns, not just individual  words,
       can be used to determine whether a message is spam or not.

       For  example, if a message contains lots of words with multiple conso-
       nants, like "ashjkbnxcsdjh", then each time a word like that  is	 seen
       the  special  token  ".GIBBERISH-CONSONANTS."  is added to the list of
       tokens found in the message.  If it turns out that most messages	 with
       words that trigger this filter rule are spam, then other messages with
       gibberish consonant strings will be more likely to be flagged as spam.

       Currently the special filters are:

       GTUBE  Flags	 any	  message      containing      the     string
	      XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-
	      EMAIL*C.34X  as spam - useful for testing that your qsf instal-
	      lation is working.

       ATTACH-SCR

       ATTACH-PIF

       ATTACH-EXE

       ATTACH-VBS

       ATTACH-VBA

       ATTACH-LNK

       ATTACH-COM

       ATTACH-BAT
	      Adds a token  for	 every	attachment  whose  filename  ends  in
	      ".scr",  ".pif",	".exe",	 ".vbs",  ".vba", ".lnk", ".com", and
	      ".bat" respectively (these are often viruses).

       ATTACH-GIF

       ATTACH-JPG

       ATTACH-PNG
	      Adds a token  for	 every	attachment  whose  filename  ends  in
	      ".gif", ".jpg" or ".jpeg", and ".png" respectively.

       ATTACH-DOC

       ATTACH-XLS

       ATTACH-PDF
	      Adds  a  token  for  every  attachment  whose  filename ends in
	      ".doc", ".xls", or ".pdf" respectively (these tend to  indicate
	      a non-spam email).

       SINGLE-IMAGE
	      Adds  a  token  if  the  message	contains exactly one attached
	      image.

       MULTIPLE-IMAGES
	      Adds a token if the message contains  more  than	one  attached
	      image.

       GIBBERISH-CONSONANTS
	      Adds  a token for every word found that has multiple consonants
	      in a row, as described above.  Spam often contains  strings  of
	      gibberish.

       GIBBERISH-VOWELS
	      Adds a token for every word found that has multiple vowels in a
	      row, eg "aeaiaiaeeio".

       GIBBERISH-FROMCONS
	      Like GIBBERISH-CONSONANTS, but only for the "From:" address  on
	      its own.

       GIBBERISH-FROMVOWL
	      Like  GIBBERISH-VOWELS, but only for the "From:" address on its
	      own.

       GIBBERISH-BADSTART
	      Adds a token for every word that starts with  a  bad  character
	      such as %.

       GIBBERISH-HYPHENS
	      Adds  a  token  for  every word with more than three hyphens or
	      underscores in it.

       GIBBERISH-LONGWORDS
	      Adds a token for every word with over 30 characters in it	 (but
	      less than 60).

       HTML-COMMENTS-IN-WORDS
	      Adds  a  token  for every HTML comment found in the middle of a
	      word.  Spam  often  contains  HTML  inside  words,  like	this:
	      w<!--dsgfhsdgjgh-->ord

       HTML-EXTERNAL-IMG
	      Adds  a  token for every HTML <img> (image) tag found that con-
	      tains :// (i.e.  it refers to an external image).

       HTML-FONT
	      Adds a token for every HTML <font> tag found.

       HTML-IP-IN-URLS
	      Adds a token for every URL found containing an IP address.

       HTML-INT-IN-URL
	      Adds a token for every URL found containing an integer  in  its
	      hostname.

       HTML-URLENCODED-URL
	      Adds  a  token  for  every URL found containing a % sign in its
	      hostname.

       Normally, filters will just cause a  token  to  be  added,  and	these
       tokens  are  processed by the normal weighting algorithm.  However the
       GTUBE filter will immediately  flag  any	 matching  message  as	spam,
       bypassing the token matching.

DATABASE BACKENDS
       The  inbuilt binary tree database backend ("btree") will not necessar-
       ily provide the best performance, but is	 provided  because  using  it
       requires no external libraries.

       If,  when qsf was compiled, the correct libraries were available, then
       it will be possible to use qsf with alternative database backends.  To
       find out which backends you have available, run qsf -V (capital V) and
       read the second line of output.	To see how well a  backend  performs,
       collect	some spam and non-spam and use qsf -d BACKEND -B SPAM NONSPAM
       (see the entry for -B above).

       Some people find that they get the best performance out	of  the	 gdbm
       backend; this is a library that is widely available on many systems.

       To  efficiently share a qsf database across multiple machines, you may
       find the MySQL backend useful.  However, using it  is  a	 little	 more
       complicated.

       To  use	the  MySQL  backend  you will need to create a table with the
       fields key1, key2, token,  value1,  value2  and	value3.	  The  token,
       value1,	value2, and value3 fields must be VARCHAR(64), BIGINT or INT,
       and BIGINT or INT respectively, and indexing on the token field	is  a
       good  idea. The key1 and key2 fields can be anything, but they must be
       present.

       For example:

		USE mydatabase;
		CREATE TABLE qsfdb (
		  key1	    BIGINT UNSIGNED NOT NULL,
		  key2	    BIGINT UNSIGNED NOT NULL,
		  token	    VARCHAR(64) DEFAULT '' NOT NULL,
		  value1    INT UNSIGNED NOT NULL,
		  value2    INT UNSIGNED NOT NULL,
		  value3    INT UNSIGNED NOT NULL,
		  PRIMARY KEY (key1,key2,token),
		  KEY (key1),
		  KEY (key2),
		  KEY (token)
		);

       The key1 and key2 fields allow you to have multiple qsf	databases  in
       one table, by specifying different key1 and key2 values on invocation.

       Instead of specifying a database file with the --database / -d option,
       you  must specify either a specification string as described below, or
       the name of a file containing such a string on its first line.

       The specification string is as follows:

		database=DATABASE;host=HOST;port=PORT;
		user=USER;pass=PASS;table=TABLE;
		key1=KEY1;key2=KEY2

       This string must be all on one line, with no spaces.

       DATABASE
	      is the name of the MySQL database.

       HOST   is the hostname of the database server (eg "localhost").

       PORT   is the TCP port to connect on (eg 3306).

       USER   is the username to connect with.

       PASS   is the password to connect with.

       TABLE  is the database table to use.  If a table with this  name	 does
	      not  exist  when qsf is called in update or training mode, then
	      it will be created if permissions allow this to be done.

       KEY1   is the value to use for the key1 field.

       KEY2   is the value to use for the key2 field.

       Since command lines can be seen in the process list,  it	 is  probably
       best  to	 specify  a filename (eg qsf -d mysql:qsfdb.spec) and put the
       specification string inside that file.

TROUBLESHOOTING
       If you have problems with qsf, please check the list  below;  if	 this
       does  not  help,	 go  to the qsf home page and investigate the mailing
       lists, or email the author.

       Nothing is being marked as spam.
	      First, use the -r option to switch on the X-Spam-Rating header,
	      and check that this header appears in email passed through qsf.
	      If it does not, then it is likely that qsf is not being run  at
	      all  -  check  your configuration of procmail(1) or its equiva-
	      lent.

	      If you are seeing X-Spam-Rating headers, and  different  emails
	      have different scores, then you may simply need to retrain your
	      database a little more.  Take more spam email and	 pass  it  to
	      qsf -m.

	      If  you  are seeing X-Spam-Rating headers but they all give the
	      same spam rating, then the most likely reason is	that  qsf  is
	      not  reading any database.  Make sure that whatever is process-
	      ing the email has read  permissions  on  /var/lib/qsfdb  and/or
	      ~/.qsfdb	- and make sure that, if you are using ~/.qsfdb, what
	      your database creator thought was ~ ($HOME) is the same  as  it
	      is for whatever is processing the email.

       Retraining sometimes takes a very long time.
	      With  the	 obtree	 backend  or 2-column MySQL or SQLite tables,
	      every 500th retrain (-m or -M), the  database  is	 pruned.   On
	      some  systems this may take some time, and during this time the
	      database is locked (except when using the MySQL or SQLite back-
	      ends).   If  you	constantly do a lot of retraining and want to
	      avoid this, then use the -N option  to  suppress	auto-pruning,
	      and  then	 have  a  cron(8) job or something run a manual prune
	      (qsf -p) every now and again.

       Running qsf from procmail fails with an error.
	      If you can run qsf from the command line, but in your  procmail
	      log  file	 you  get  errors  about  "qsf: cannot execute binary
	      file", then contact your system administrator for help. It  may
	      be  that incoming email is handled by a different server to the
	      one you normally shell into, and either they are of a different
	      architecture  or	operating  system,  or the mail server is not
	      permitted to execute user-owned binaries.

ACKNOWLEDGEMENTS
       The following people have contributed suggestions, comments,  patches,
       and testing:

	      Tom Parker <http://www.bits.bris.ac.uk/palfrey/>
	      Dr Kelly A. Parker
	      Vesselin Mladenov <http://www.antipodes.bg/>
	      Glyn Faulkner
	      Mark Reynolds
	      Sam Roberts
	      Scott Allen
	      Karsten Kankowski
	      M. Kolbl
	      Micha Holzmann
	      Jef Poskanzer <http://www.acme.com/jef/>
	      Clemens Fischer <http://ino-waiting.gmxhome.de/>
	      Nelson A. de Oliveira

AUTHOR
       The author:

	      Andrew Wood <andrew.wood@ivarch.com>
	      http://www.ivarch.com/

       Project home page:

	      http://www.ivarch.com/programs/qsf/

BUGS
       If you find any bugs, please contact the author, either by email or by
       using the contact form on the web site.

SEE ALSO
       procmail(1), procmailrc(5), procmailex(5)

       Someone has written a guide to using qsf with KMail that can be	found
       at:
       http://www.softwaredesign.co.uk/Information.SpamFilters.html

LICENSE
       This is free software, distributed under the ARTISTIC license.

Linux				   May 2005			       QSF(1)
