$Id: mysql_drv.txt,v 1.8 2006/05/13 01:12:58 jonz Exp $

mysql_drv: MySQL storage driver for DSPAM <jonathan@nuclearelephant.com>
Copyright (c) 2002-2006 Jonathan A. Zdziarski
http://www.nuclearelephant.com/projects/dspam/

LICENSE

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; version 2
of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

ABOUT

mysql_drv is a MySQL storage driver for DSPAM v3.0 and above.  This driver
enables DSPAM to read and write all token, signature, and statistics data
from a MySQL database.  The advantages of using a SQL backend are obvious:

- Centralized data storage
- Structured queries for information
- No need for context locking mechanisms
- Replication and other MySQL features

It is STRONGLY RECOMMENDED that you run DSPAM with MySQL v4.1 or greater, as
some critical issues were addressed including:

  - Support for full unsigned 2^^64 integer support
    This will allow DSPAM to store its token keys as numeric (8 bytes) instead
    of character (20 bytes). On older versions of MySQL, DSPAM is forced to 
    use character fields. Using the mysql_objects-4.1.sql script will 
    result in a database about 30-35% smaller and more optimized.

  - Support for subqueries
    This allows purge-4.1.sql to conditionally purge based on the user's
    training mode, which allows a cleaner and more granular purge.

  - Insert Speed
    4.1 Supports the use of ON DUPLICATE KEY which allows us to perform all
    token inserts with a single query. As a result, MySQL 4.1 is about 20%
    faster.

0. CHECK FOR 4.1 BUG

Some versions of 4.1 apparently have a bug with the way bigint's are handled. 
If you have this bug, you'll need to either upgrade to a better version of 
MySQL or use the 4.0 (speed, space) object scripts for DSPAM instead of the 4.1
scripts.

To check for this bug, log into MySQL and issue these statements:

create table dspam_test (token bigint(20));

insert into dspam_test values('3232933621568372736'),('5174632569486238062'),('5860192774298853742'),('8787532532089676142'),('8029914733239132526');

select token from dspam_test where token in('3232933621568372736','8787532532089676142','8029914733239132526','5174632569486238062','5860192774298853742');

Broken versions of MySQL will return only one row, whereas a working
version will return all five rows.

1. CONFIGURING DSPAM

To configure DSPAM to use mysql_drv, use the following arguments while running
DSPAM's configure:

--with-storage-driver=mysql_drv 
Tells DSPAM to use the mysql_drv driver

--with-mysql-libraries=/path/to/libs
Tells DSPAM where to find the MySQL client libraries.  They are usually
located in /usr/local/mysql/lib

--with-mysql-includes=/path/to/libs
Tells DSPAM where to find the MySQL headers.  They are usually located in
/usr/local/mysql/include

--enable-virtual-users
Tells DSPAM to create virtual user ids for each dspam user.  Use this if your 
users don't really exist on the system (e.g. via getpwuid) or if you're doing
something weird like sharing uids.

After configure has successfully finished, build and install DSPAM using the
instructions from DSPAM's readme.

2. CREATING MYSQL OBJECTS

Before mysql_drv will function, you must run one of the mysql_objects.sql files
(located in src/tools.mysql_drv) to create the table objects required by the 
driver.  Depending on whether you are looking to optimize for speed or disk 
space, you should run the appropriately named script.  See your mysql 
documentation for further information.

This script assumes that you have already created a database for DSPAM objects
and a user with full access to SELECT, INSERT, UPDATE, and DELETE.

If you plan on enabling virtual users (something you'll need to do if the users
don't actually exist on your system), also run virtual_users.sql.

NOTE: If you plan on having more than 32,768 users on the system, you'll want
to change 'smallint' to 'int' for all uid fields.

You may also wish to add this line to the mysqld portion of /etc/my.cnf:

set-variable = max_allowed_packet=8192000

This will extend the query size from 1MB to 4MB; some larger queries may
cause MySQL to drop connection without this set.

3. UPDATING DSPAM.CONF

mysql_drv needs to know how to connect to your MySQL database.  You will
need to specify this information in dspam.conf. This file already contains some
example data:

MySQLServer    /var/lib/mysql/mysql.sock
MySQLPort
MySQLUser      dspam
MySQLPass      changeme
MySQLDb        dspam
MySQLCompress  true

Or if you'd like to connect via TCP, use the IP address and port of the 
MySQL server. DSPAM checks to see if the first character of the HOSTNAME field 
is a slash, and if so will treat it like a socket file.

4. NIGHTLY PURGE

If you would like to purge the many stale tokens DSPAM will have lying around 
the database, you should run one of the provided purge scripts nightly.
The dspam_clean tool can be configured to perform more granular cleansing, 
such as that of uninteresting data. See DSPAM's README for more information 
about the dspam_clean tool.

If you are supporting TOE or TUM-mode users on your system, you will want to 
do one of the following:

  1. Preferences Extensions + MySQL 4.1 or Greater

     If you are running MySQL 4.1 or greater, it is recommended you activate
     preferences extension support and recompile. You may then use 
     purge-4.1.sql instead of purge.sql, which will skip certain types of
     purges for users with TOE/TUM specified in their preferences.

     NOTE: You should add a preference for any global users on your system, so
     that their data is purged as if TOE-based (since global user data is
     rarely updated). You can do this using:

     dspam_admin add pref [username] trainingMode TOE

  2. Global TOE Support or MySQL < 4.1

     If you are running an earlier version of MySQL or if you are using
     TOE globally (and not in individual user preferences), then you should
     use purge.sql, but remove the following lines:

     delete from dspam_token_data
        where @a-to_days(last_hit) > 90;

     This will prevent the purging of stale tokens, which could cause serious
     data loss in TOE databases (because tokens are never touched unless
     an error has occured).  All other purges should be safe even for TOE-mode 
     users.

If you will NOT be supporting TOE users on your system, you may simply run
the purge.sql script nightly, as-is.

5. TUNING

If you have a busy server, and find a lot of table locks, you may consider
making a few tweaks to the MySQL configuration. Alternatively, you may
consider innodb which performs row-level locking, but this is rarely necessary.

  Increase key_buffer_size. If you have the memory, try at least 256M or 512M.

  Increase table_cache. Try a higher value (some systems go as high as 1024).
  This is related to the max_connections option and allows many file descriptors
  to be shared among threads

  Increase myisam_sort_buffer_size to a higher value, depending on your memory.
  Ideally, a few hundred MB would be great, but at least 64M would be an
  improvement.

6. REPAIRING

If your database gets corrupt, you'll need to repair it. This could take a
long time, and so it may make sense to keep a hot backup somewhere. You can
run a command like this to repair the database:

mysqlcheck --all-databases --fast --auto-repair

ERRORS

Any SQL errors will be reported to LOGDIR/sql.errors as well as the standard
syslog facilities (although the query will be truncated).

QUESTIONS

Please contact the dspam-dev mailing list with any questions or constructive
feedback.

Jonathan A. Zdziarski
<jonathan@nuclearelephant.com>
