

Full text search indexing
=========================


Dovecot v1.1 supports two FTS indexers: Squat and Lucene. It's also possible to use both of them at the same time. 


Squat
=====


Squat indexes allow quick searches for substrings, which is its main advantage. As far as I know there are no other open source full text search indexers supporting it. IMAP SEARCH command requires substring searching, so Squat indexes are the only choice if you want to speed up SEARCHes for standard IMAP clients. 
Squat works by building a trie of each four character combinations in all messages, and then giving a list of UIDs where they are found. This list isn't the final result, but it limits the messages that Dovecot actually opens and searches through. Because only four character combinations are indexed, Dovecot can't optimize searching for 1-3 character words. 
The Squat name comes from Cyrus IMAP [None] which implements somewhat similar Squat indexes ("Search QUery Answer Tool"). Dovecot's implementation and file format however is completely different. The main visible difference is that Dovecot allows updating the index incrementally instead of requiring to re-read the entire mailbox to build it. 
The current Squat implementation isn't perfect yet though. There are plans [None] to make it faster and use less disk space. 
Alternative idea to supporting substring searches is to build a binary search tree for all words and then a substring search index for those words. 


Configuration
=============



---%<-------------------------------------------------------------------------
protocol imap {
..
  mail_plugins = fts fts_squat
}
...
plugin {
  fts = squat
}
---%<-------------------------------------------------------------------------



Lucene
======


Lucene indexes can be used for non-standard IMAP SEARCHes, so they're useful only for modified webmail applications and such. Instead of using "TEXT" and "BODY" search commands, use "X-TEXT-FAST" and "X-BODY-FAST". 
Dovecot builds only a single Lucene index for all mailboxes. In future this would allow support for searching messages quickly from all mailboxes. 


Configuration
=============



---%<-------------------------------------------------------------------------
protocol imap {
..
  mail_plugins = fts fts_lucene
}
...
plugin {
  fts = lucene
}
---%<-------------------------------------------------------------------------



Both
====


It's also possible to use both Squat and Lucene indexes. Lucene will be used for X-TEXT-FAST and X-BODY-FAST searches while Squat will be used for TEXT, BODY and HEADER searches. 

---%<-------------------------------------------------------------------------
protocol imap {
..
  mail_plugins = fts fts_squat fts_lucene
}
...
plugin {
  fts = squat lucene
}
---%<-------------------------------------------------------------------------

(This file was created from the wiki on 2007-12-11 04:42)
