Instructions for hacking on Xapian
==================================

.. contents:: Table of contents

This file is aimed to help developers get started with working on
Xapian.  The documentation contains a section covering various internal
aspects of the library - this can also be found on the Xapian website
<http://www.xapian.org/>.

Extra options to give to configure:
===================================

Note: Non-developer configure options are described in INSTALL

You will probably want to use some of these if you're going to be developing
Xapian.

--enable-assertions
	This enables compiling of assertion code which will throw
	Xapian::AssertionError if the code detects violating of
	preconditions, postconditions, or fails other consistency checks.

--enable-assertions=partial
	This option enables a subset of the assertions enabled by
	"--enable-assertions", but not the most expensive.  The intention is
	that it should be suitable for use in a real-world system for tracking
	down problems without imposing too much of an overhead (but note that
	we haven't yet performed timings to measure the overhead...)

--enable-log
	This enables compiling into the system of code to generate verbose
	debugging messages.  See "Debugging Messages", below.

--enable-maintainer-mode
	This tells configure to enable make dependencies for regenerating build
	system files (such as configure, Makefile.in, and Makefile) and other
	generated files (such as the stemmers and query parser) when required.
	These are disabled by default as some make programs try to rebuild them
	when it's not appropriate (e.g. BSD make doesn't handle VPATH except
	for implicit rules).  If you enable maintainer mode you probably need
	to use a better make program (GNU make is recommended).  You'll also
	need a non-cross-compiling C compiler for compiling the Lemon parser
	generator and the Snowball stemming algorithm compiler.  The configure
	script will attempt to locate one, but you can override the
	autodetection by passing CC_FOR_BUILD on the command line like so:

	./configure CC_FOR_BUILD=/opt/bin/gcc

--enable-documentation
	This tells configure to enable make dependencies for regenerating
	documentation files.  By default it uses the same setting as
	--enable-maintainer-mode.

Debugging Messages
==================

If you configure with --enable-log, lots of places in the code generate
debugging messages to tell us what they're up to - this information can be
very useful for debugging both the Xapian library and code which uses it.  But
the quantity of information generated is potentially vast so there's a
mechanism to allow you to select where to store the log and which types of
message you're interested by setting environment variables.  You can:

 * set XAPIAN_DEBUG_LOG to be the path to a file that you would like debugging
   output to be stored in (to override the default of stderr).  The first
   occurrence of %% in the name will be replaced with the process-id.

 * set XAPIAN_DEBUG_FLAGS to the decimal value of a bitmap indicating the types
   of debugging message you would like to display (the default value is 0,
   which disables all debug messages).  To turn on message type N, bitwise OR
   XAPIAN_DEBUG_FLAGS with (1<<N) - e.g. for message type 3, OR with 8.  To
   turn on all types, set XAPIAN_DEBUG_FLAGS to -1 (which is all bits set in
   two's complement binary representation).  Each message gives its numerical
   type in the debug log output.

These environment variables only have any effect if you ran configure with the
--enable-log option.

Debugging memory allocations
============================

The testsuite can make use of valgrind to check for memory leaks and reads
from uninitialised memory during tests.  This restricts the platforms which
we can catch leaks on (valgrind currently supports x86, x86_64, and powerpc
Linux reliably, with other ports being investigated).  However Xapian contains
very little platform specific code (and most of what there is is Windows
specific) so even just testing with valgrind on one platform gives good
coverage.

If you have a new enough version of valgrind installed, it's automatically
detected by configure and used when running the testsuite.  The testsuite runs
more slowly under valgrind, so if you wish to disable this auto-detection you
can run configure with:

./configure VALGRIND=

Or you can disable use of valgrind during a particular run of "make check"
like so:

make check VALGRIND=

Or disable it while running a test directly (under sh or bash):

VALGRIND= ./runtest ./apitest

Running test programs
=====================

To run all tests, use ``make check``.  You can also run just the subset of
tests which exercise the inmemory, remote progserver, remote TCP,
multi-database, quartz, or flint backends using ``make check-inmemory``, ``make
check-remoteprog``, ``make check-remotetcp``, ``make check-multi``,
``make check-quartz``, or ``make check-flint`` respectively.

Also, ``make check-remote`` will run the tests on both variants of the remote
backend, and ``make check-none`` will run those tests which don't use any
backend.  These are handy shortcuts when doing development work on a particular
backend.

The runtest script (in the tests subdirectory) takes care of the details of
running the test programs (including setting up the environment so they work
when srcdir != builddir and handling libtool dynamically linked binaries).  To
run a test program by hand (rather than via make) just use:

./runtest ./apitest

You can specify options and arguments.  Individual test programs optionally
take one or more test names as arguments, and you can also pass ``-v`` to get
more verbose output from failing tests, e.g.:

./runtest ./apitest -v deldoc1

If the number of the test is omitted, all tests with that basename are run,
so to run deldoc1, deldoc2, etc:

./runtest ./apitest deldoc

You can also use runtest to run a test program under gdb (or most other tools):

./runtest gdb ./apitest -v deldoc1
./runtest valgrind ./apitest -v deldoc1

Some test programs take special arguments - for example, you can restrict
apitest to the flint backend using ``-bflint``.

There are a few environmental variables which the testsuite harness checks for
which you might find useful:

  XAPIAN_TESTSUITE_SIG_DFL:
    By default, the testsuite harness catches signals and handles them
    gracefully - the current test is failed, and the testsuite moves onto the
    next test.  If you want to suppress this (some debugging tools may work
    better if the signal is not caught) set the environment variable
    XAPIAN_TESTSUITE_SIG_DFL to any value to prevent the testsuite harness
    from installing its own signal handling.

  XAPIAN_TESTSUITE_OUTPUT:
    By default, the testsuite harness uses ANSI escape sequences to give
    colour output if stdout is a tty.  You can disable this feature by setting
    XAPIAN_TESTSUITE_OUTPUT=plain (alternatively, piping the output (e.g.
    through cat or more) will have the same effect).  Auto-detection can be
    explicitly specified with XAPIAN_TESTSUITE_OUTPUT=auto (or empty).  Any
    other value forces the use of colour.  Colour output is always disabled on
    Microsoft Windows, so XAPIAN_TESTSUITE_OUTPUT has no effect there.

Using various debugging, profiling, and leak-finding tools:
===========================================================

If you're using GCC 3.4 or newer, you can turn on debugging iterators, etc in
the GNU C++ STL by defining _GLIBCXX_DEBUG:

  ./configure CPPFLAGS=-D_GLIBCXX_DEBUG

For documentation of this option, see:
http://gcc.gnu.org/onlinedocs/libstdc++/debug.html

Note: all C++ code must be compiled with this defined or you'll get problems -
Xapian 0.9.7 and later add a suitable check to xapian/version.h to prevent you
making this mistake.

To use valgrind (http://www.valgrind.org/), no special build options are
required, but make sure you compile with debugging information (on by default
for GCC) and the valgrind documentation recommends disabling optimisation (with
optimisation, line numbers in error messages can be confusing due to code
inlining, etc):

  ./configure CXXFLAGS='-O0 -g'

To use gdb (http://www.gnu.org/software/gdb/), no special build options are
required, but make sure you compile with debugging information (on by default
for GCC).  You'll probably find debugging easier if you compile without
optimisation (with optimisation, line numbers in error messages can be
confusing due to code inlining, etc, and the values of some variables can't be
printed because they've been eliminated from the code completely):

  ./configure CXXFLAGS='-O0 -g'

To enable profiling for gprof:

  ./configure CXXFLAGS=-pg LDFLAGS=-pg

To use Purify (a proprietary tool):

  ./configure CXXLD='purify c++' --disable-shared

To use Insure (another proprietary tool):

  ./configure CXX=insure

If you have runes for using other tools, please add them above, or send them
to us so we can.

Building from SVN:
==================

If you're building code from SVN, you'll want to configure with:

./configure --enable-maintainer-mode

This will be done for you if you use the top-level bootstrap script and then
run the top-level configure this produces (see below for more information about
this).  If you don't enable maintainer mode, then rules to rebuild generated
sources are disabled (and similarly rules to build documentation are only
enabled by --enable-documentation, or --enable-maintainer-mode without
--disable-documentation).

The SVN repository does not contain any automatically generated files
(such as configure, Makefile.in, Lemon generated sources, etc) because
experience shows it's best to keep these out of version control.  This
means that if you check the sources out of SVN, before you can successfully
run the normal build process you'll need to have several programs installed
so these files can be generated.  Note that you can avoid needing to have
these programs installed by using the SVN snapshots available from the
"Bleeding Edge" page of the Xapian website.  These snapshots are bootstrapped
tarballs much like any release version.

At the time of writing, these programs are autoconf, automake, and libtool.
Some older versions of these programs may not work correctly at the time of
writing, we require the following versions:

	autoconf (GNU Autoconf) 2.59
		2.57 fixes the annoying chmod warning on FreeBSD.  automake
		1.8.5 needs at least 2.58, and 2.59 was released the same day
		as 2.58 to fix a problem.  Currently snapshots and release
		tarballs are generated with autoconf 2.61 but this isn't yet a
		hard requirement (because the spec file for building RPMs
		currently runs autoreconf to avoid a libtool bug with setting
		rpath for /usr/lib64, and such platforms may still only have
		autoconf 2.59).

	automake (GNU automake) 1.8.3
		automake 1.5 adds support for AM_CXXFLAGS, but doesn't work
		with "make check" with Solaris make - the problem is with the
		rules to build tests/internaltest (perhaps no longer relevant
		as those rules are simpler now).

		Note that automake 1.6 has a bug which causes it to emit
		spurious warnings: this is fixed in automake 1.6.1.

		automake 1.7 and 1.8 work too (we required 1.8.5 for ages).

		automake 1.9's NEWS file suggests it will benefit us with
		smaller Makefile.ins amongst other things.

		automake 1.10 requires autoconf 2.60.  Currently snapshots and
		release tarballs are generated with automake 1.10, but this
		isn't a hard requirement - configure.ac requires at least
		1.8.3, but only because this is the version which SLES 9 had
		and the RPM spec file runs autoreconf (RHEL 4 had 1.9.2; Debian
		sarge had 1.9.5).  Please use a newer version (1.9.x or 1.10)
		for development if at all possible.

	GNU libtool 1.5.24
		libtool 1.5 was the first version to properly support linking
		C++ libraries, and 1.5.24 is largely 1.5 plus bug fixes and
		portability enhancements.  (Note: nothing actually enforces
		the requirement for 1.5.24, but this is the version which
		snapshots and release tarballs are currently bootstrapped
		with).

Please tell us if you find that older or newer versions of any of these
tools work or fail to work.

We have provided a simple script (bootstrap) to run these programs for you
on all the xapian modules you've checked out of SVN to produce a source tree
like that you'd get from unpacking the result of "make dist".  bootstrap is
in SVN in the level above xapian-core, etc.  Running bootstrap generates
a configure script in the top level which allows you to configure xapian-core
and any other modules you've checked out with one command.

The bootstrap script should be run from its source directory (ie, from the
directory containing it).  The configure script generated by it supports
building in a separate directory to the sources: simply create the directory
you want to build in, and then run the configure script from inside that
directory.  For example, to build in a directory called "build" (starting in
the top level source directory)::

  ./bootstrap
  mkdir build
  cd build
  ../configure

When running bootstrap, you may need to add extra macro directories to the path
searched by aclocal (which is part of automake) - you can do this by specifying
these in the ACLOCAL_FLAGS environment variable, e.g.::

  ACLOCAL_FLAGS=-I/extra/macro/directory ./bootstrap

There is a good GNU autotools tutorial at
<http://www-src.lip6.fr/homepages/Alexandre.Duret-Lutz/autotools.html>.

If you are tracking development in SVN, there will sometimes be changes to the
build system sources which require regeneration of the generated makefiles and
associated machinery.  We aim to make the build system automatically regenerate
the necessary files, but in the event that a build fails after an update, it
may be worth re-running the bootstrap script to regenerate the build system
from scratch, before looking for the cause of the error elsewhere.

If you want to be able to build distribution tarballs (with "make dist") then
you'll also need some further tools.  The build system is designed to fail with
a suitable message if you lack any of the required tools (the alternative is to
build a tarball with various bits missing, which is best avoided - better to be
told to install pdflatex than to upload a tarball with no PDF manual).

These tools are:

doxygen (v1.5.2 is used for snapshots and releases; 1.4.6 produced incomplete
	 documentation for Xapian::Query).
dot (part of the graphviz package)
perl 5
pdflatex (on Debian, tetex-extra is also required for fancyhdr.sty)
makeindex (usually packaged with TeX)
help2man
rst2html or rst2html.py (on Debian/Ubuntu, this is provided by the
        python-docutils package)
pngcrush (optional - used to reduce the size of PNG files in the HTML apidocs)

Building from SVN on Windows with MSVC:
---------------------------------------

The windows build process is maintained in the xapian-maintainer-tools
directory in the subversion repository.  See the win32msvc/README file in that
directory for details of how to build from subversion.

Use of C++ Features:
====================

* STL:  We decided early on to embrace the C++ STL.  Some older compilers
  don't include full support for this.  Often we can work around this, for
  example:

  * Providing our own auto_ptr implementation (AutoPtr).
  * Using string::resize(0) instead of string::clear() (for GCC 2.95).
  * Avoiding use of '#include <limits>' (for GCC 2.95; GCC 3.0+ support it).

  There is now plenty of choice of compilers which provide good conformance to
  ISO C++, so if working around problems for some compiler proves too hard we
  should just document the issue and users will either have to upgrade to a
  more compliant compiler, or use another STL implementation such as STLPort
  (http://www.stlport.org/).

* C++ features we currently assume:

  * We assume <sstream> is available.  GCC < 2.95.3 didn't have it but GCC
    2.95.3 includes a backported version.  We aren't aware of any other
    compilers still in use which lack it.

  * Non-".h" versions of C++ headers.  We assume that <iostream> is available
    and that we don't ever have to use <iostream.h> instead.

* RTTI (dynamic_cast<>, typeid, etc):  Needing to use RTTI features in the
  library most likely indicates a design flaw, and you should avoid use
  of these features.  Where necessary, you can use a technique similar to
  Database::as_networkdatabase() to replace dynamic_cast<>.

* Exceptions:  In hindsight, throwing exceptions in the library seems to have
  been a poor design decision.  GCC on Solaris can't cope with exceptions in
  shared libraries, and we've also had test failures on other platforms which
  only occur with shared libraries - possibly with a similar cause.  Exceptions
  can also be a pain to handle elegantly in the bindings.  We intend to
  investigate modifying the library to return error codes internally, and then
  offering the user the choice of exception throwing or error code returning
  API methods (with the exception being thrown by an inlined wrapper in the
  externally visible header files).  With this in mind, please don't complicate
  the internal handling of exceptions...

* "using namespace std;" and "using std::XXX;" - it's OK to use these in
  applications, library code, and internal library headers.  But in externally
  visible headers (such as anything included by "#include <xapian.h>") you MUST
  use explicit "std::" qualifiers - it's not acceptable to pull anything from
  namespace std into the namespace of an application which uses Xapian.

* Use C++ style casts (static_cast<>, reinterpret_cast<>, and const_cast<>)
  in preference to C style casts.  The syntax is ugly, but they do make the
  intent much clearer which is definitely a good thing.

* std::pair<> with an STL class as one (or both) of the members can produce
  very long symbols (over 4KB!) after name mangling - long enough to overflow
  the size limits of some vendor compilers or toolchains (so this can affect
  GCC if it is using the system ld or as).  Even where the compiler works, the
  symbol bloat in an unstripped build is probably best avoided, so it's
  preferable to use a simple two member struct instead.  The code is probably
  more readable anyway, and easier to extend if more members are needed later.

* We try to avoid putting the full definition of virtual methods in header
  files.  This is because current compilers can't (as far as we know) inline
  virtual methods, so putting the definition in the header file simply slows
  down compilation (and, because method definitions often require further
  header files to be included, this can result in many more files needing
  recompilation after a change to a header file than is really necessary).
  Just put the declaration in the header file, and put the definition in a .cc
  file with the same basename.

Miscellaneous Portability Issues:
=================================

Web Resources:
--------------

The "C++ FAQ Lite" covers many frequently asked C++ questions:
http://www.parashift.com/c++-faq-lite/

The libstdc++-porting-howto discusses various C++ portability issues:
http://gcc.gnu.org/onlinedocs/libstdc++/17_intro/porting-howto.html

<fcntl.h>:
----------

Don't directly '#include <fcntl.h>' - instead '#include "safefcntl.h"'.

The main reason for this is that when using certain compilers on certain
versions of Solaris, fcntl.h does '#define open open64'.  Sadly this breaks C++
code which has methods called open (as we do).  There's a cunning workaround
for this problem in common/safefcntl.h.

Also, safefcntl.h ensures the O_BINARY is defined (to 0 if not required) so
calls to open() and creat() can specify O_BINARY unconditionally for the
benefit of platforms which discriminate between text and binary files.

<windows.h>:
------------

Don't directly '#include <windows.h>' - instead '#include "safewindows.h"'
which reduces the bloat of header files included and prevents some of the
more egregious namespace pollution.  It also defines any constants we need
which might be missing in older versions of the mingw headers.

<winsock2.h>:
-------------

Don't directly '#include <winsock2.h>' - instead '#include "safewinsock2.h"'.
This ensure that safewindows.h is included before <winsock2.h> to avoid
winsock2.h including windows.h without our namespace pollution reducing
workarounds.

<errno.h>:
----------

Don't directly '#include <errno.h>' - instead '#include "safeerrno.h"' which
works around a problem with Compaq's C++ compiler.

<sys/select.h>:
---------------

Don't directly '#include <sys/select.h>' - instead '#include "safesysselect.h"'
which supports older UNIX platforms which predate POSIX 1003.1-2001 and works
around a problem on Solaris.

<sys/stat.h>:
-------------

Don't directly '#include <sys/stat.h>' - instead '#include "safesysstat.h"'
which under MSVC enables stat to work on files > 2GB, defines the missing
POSIX macros S_ISDIR and S_ISREG, pulls in <direct.h> for mkdir() (which is
provided by sys/stat.h under UNIX) and provides a compatibility wrapper for
mkdir() which takes 2 arguments (so code using mkdir can always just pass
two arguments).

<unistd.h>:
-----------

Don't directly '#include <unistd.h>' - instead '#include "safeunistd.h"'
- MSVC doesn't even HAVE unistd.h!

The various "safe" headers are maintained in xapian-core/common, but also used
by Omega.  Omega pulls in a copy using the svn:externals property which is
set on xapian-applications/omega.  Because of how this feature of SVN works,
we pull in a read-only copy via HTTP access to the main repository, so you
have to update it in xapian-core, and if you have ssh write access to the
repo but no HTTP access, this will fail.

The imported URL has to be absolute, which isn't too branch friendly.  To avoid
problems from this, we specify a particular revision to import, but this does
mean we need to monitor changes to xapian-core and decide when to update omega.
The release checklist includes a reminder to check this.

Warning Free Compilation:
-------------------------

Compiling without warnings on every platform is our goal, though it's not
always possible to achieve.  For example, GCC 2.95 produces a few bogus
warnings (e.g. about not returning a value from a non-void function),
and some GCC 3.x compilers produce the occasional bogus warning (e.g.
warning that a variable may be used uninitialised, despite it being initialised
at the point of declaration!)

If using GCC 3.0 or newer, you should consider configure-ing with:

./configure CXXFLAGS=-Werror

when doing development work on Xapian.  This promotes warnings to errors,
which should ensure you at least don't introduce new warnings for the compiler
you're using.

If you configure with --enable-maintainer-mode, and are using GCC 4.0 or newer,
this is done for you automatically.  This is intended to be an aid rather than
a form of automated punishment - it's all too easy to miss a new warning as
once a file is compiled, you don't see it unless you modify that file or one of
its dependencies.

With Intel's C++ compiler, --enable-maintainer-mode also enables -Werror.
If you know the equivalent of -Werror for other compilers, please add a note
here, or tell us so that we can add a note.

Branch Prediction Hints
=======================

GCC 3.0 and newer and Intel's C++ compiler both provide a mechanism for giving
the compiler hints to assist branch prediction (using __builtin_expect()).
Within the xapian-core library code, you can mark the expressions in ``if`` and
``while`` statements as ``rare`` (if the condition is rarely true) or ``usual``
(if the condition is usually true).

For example::

    if (rare(something_unusual())) deal_with_it();

    while (usual(!end_condition()) keep_going();

It's easy to make incorrect assumptions about where hotspots are and which
branches are usually taken or not, so except for really obvious cases (such
as ``if (!consistency_check()) throw_exception();``) you should benchmark
that new ``rare`` and ``usual`` hints help rather than hinder before committing
them to the repository.  It's also likely to be a waste of effort to add them
outside of areas of code which are executed very frequently.

Don't expect miracles - the first 15 uses added saved approximately 1%.

If you know how to implement the ``rare`` and ``usual`` macros for other
compilers, please let us know.

Configure Options
=================

Especially for a library, compile-time options aren't a good solution for
how to integrate a new feature.  An increasingly large number of users install
pre-built binary packages rather than building from source, and unless the
package is capable of being split into modules, the packager has to choose a
set of compile-time options to use.  And they'll tend to choose either the
standard ones, or perhaps a broader set to try to keep everyone happy.  For a
library, similar issues occur when installing from source as well - the
sysadmin must choose the options which will keep all users happy.

Another problem with compile-time options is that it's hard to ensure that
a change doesn't break compilation under some combination of options without
actually building and running the test-suite on all combinations.  The fewer
compile-time options, the more likely the code will compile with every
combination of them.

So please think carefully before adding more compile-time options.  They're
probably OK for experimental features (but should go away once a feature is no
longer experimental).  Options to instrument a build for special purposes
(debug, profiling, etc) are also acceptable.  Disabling whole features probably
isn't (e.g. the --disable-backend-XXX options we already have are dubious,
though being able to disable the remote backend can be useful when trying to
get Xapian going on a platform).

Makefile Portability:
=====================

We don't want to force those building Xapian from the source distribution to
have to use GNU make.  Requiring GNU make for "make dist" isn't such a problem
but it's probably better to use portable constructs everywhere to avoid
problems when people move or copy code between targets.  If you do make use
of non-portable constructs where it's OK, add a comment noting the special
circumstances which justify doing so.

Here's an incomplete list of things to avoid:

* Don't use "$(RM)" - it's defined by GNU make, but using it actually harms
  portability as other makes don't define it.  Use plain "rm" instead.

* Don't use "%" pattern rules - these are GNU make specific.  Use an
  implicit rule (e.g. ".c.o:") if you can.  Otherwise, write out each version
  explicitly.

* Don't use "$<" except in implicit rules.  This is an annoying restriction,
  as using "$<" makes it much easier to make VPATH builds work.  But it's only
  portable in implicit rules.  Tips for rewriting - if it's a source file,
  write it as::

    $(srcdir)/foo.ext

  If it's a generated object file or similar, just write the name as is.  The
  tricky case is a generated file which isn't in SVN but is shipped in the
  distribution tarball, as such a file could be in either the source or build
  tree.  Use this trick to make sure it's found whichever directory it's in::

    `test -f foo.ext || echo '$(srcdir)/'`foo.ext

* Don't use "exit 0" to make a rule fail.  Use "false" instead.  BSD make
  doesn't like "exit 0" in a rule.

* Don't use make conditionals.  Automake offers conditionals which may be
  of use, and these are implemented to work with any make.  See the automake
  manual for details, and a few caveats.

* The list of portable utilities is:

    cat cmp cp diff echo egrep expr false grep install-info
    ln ls mkdir mv pwd rm rmdir sed sleep sort tar test touch true

  Note that versions of these (GNU versions in particular) support switches
  which aren't portable - notably, "test -r" isn't portable; neither is
  "cp -a".  And note that "mkdir -p" isn't portable - the semantics vary.
  See the "Goat Book" for more details and other useful tips:

    http://sources.redhat.com/autobook/

* Don't use "include" - it's not present in BSD make (at least some versions
  have ".include" instead, but that doesn't really seem to help...)  Automake
  provides a configure-time include, which may provide a replacement for some
  uses of "include".

* It appears that BSD make only supports VPATH for implicit rules (e.g. ".c.o:")
  - there's certainly a restriction there which is not present in GNU make.
  We used to try to work around this, but now we use AM_MAINTAINER_MODE to
  disable rules which are only needed by those developing Xapian (these were
  the rules which caused problems).  And we recommend those developing Xapian
  use GNU make to avoid problems.

* Rules with multiple targets can cause problems for parallel builds.  These
  rules are really just a shorthand for multiple rules with the same
  prerequisites and commands, and it is fine to use them in this way.  However,
  a common temptation is to use them when a single invocation of a command
  generates multiple output files, by adding each of the output files as a
  target.  Eg, if a swig language module generates xapian_wrap.cc and
  xapian_wrap.h, it is tempting to add a single rule something like::

    # This rule has a problem
    xapian_wrap.cc xapian_wrap.h: xapian.i
            SWIG_commands

  This can result in SWIG_commands being run twice, in parallel.  If
  SWIG_commands generates any temporary files, the two invocations can
  interfere causing one of them to fail.

  Instead of this rule, one solution is to pick one of the output files as a
  primary target, and add a dependency for the second output file on the first
  output file::

    # This rule also has a problem
    xapian_wrap.h: xapian_wrap.cc
    xapian_wrap.cc: xapian.i
            SWIG_commands

  This ensures that make knows that only one invocation of SWIG_commands is
  necessary, but could result in problems if the invocation of SWIG_commands
  failed after creating xapian_wrap.cc, but before creating xapian_wrap.h.
  Instead, we recommend creating an intermediate target::
  
    # This rule works in most cases
    xapian_wrap.cc xapian_wrap.h: xapian_wrap.stamp
    xapian_wrap.stamp: xapian.i
            SWIG_commands
            touch $@

  Because the intermediate target is only touched after the commands have
  executed successfully, subsequent builds will always retry the commands if an
  error occurs.  Note that the intermediate target cannot be a "phony" target
  because this would result in the commands being re-run for every build.

  However, this rule still has a problem - if the xapian_wrap.cc and
  xapian_wrap.h files are removed, but the xapian_wrap.stamp file is not, the
  .cc and .h files will not be regenerated.   There is no simple solution to
  this, but the following is a recipe taken from the automake manual which
  works.  For details of *why* it works, see the section in the automake manual
  titled "Multiple Outputs"::

    # This rule works even if some of the output files were removed
    xapian_wrap.cc xapian_wrap.h: xapian_wrap.stamp
    ## Recover from the removal of $@.  A full explanation of these rules is in
    ## the automake manual under the heading "Multiple Outputs".
            @if test -f $@; then :; else \
              trap 'rm -rf xapian_wrap.lock xapian_wrap.stamp' 1 2 13 15; \
              if mkdir xapian_wrap.lock 2>/dev/null; then \
                rm -f xapian_wrap.stamp; \
                $(MAKE) $(AM_MAKEFLAGS) xapian_wrap.stamp; \
                rmdir xapian_wrap.lock; \
              else \
                while test -d xapian_wrap.lock; do sleep 1; done; \
                test -f xapian_wrap.stamp; exit $$?; \
              fi; \
            fi
    xapian_wrap.stamp: xapian.i
            SWIG_commands
            touch $@

* This is actually a robustness point, not portability per se.  Rules which
  generate files should be careful not to leave a partial file in place if
  there's an error as it will have a timestamp which leads make to believe it's
  up-to-date.  So this is bad:

  foo.cc: script.pl
	$PERL script.pl > foo.cc

  This is better:

  foo.cc: script.pl
	$PERL script.pl > foo.tmp
	mv foo.tmp foo.cc

  Alternatively, pass the output filename to the script and make sure you
  delete the output on error or a signal (although this approach can leave
  a partial file in place if the power fails).  All used Makefile.am-s and
  scripts have been checked (and fixed if required) as of 2003-07-10 (didn't
  check xapian-bindings).

And lastly a style point - using "@" to suppress echoing of commands being
executed removes choice from the user - they may want to see what commands
are being executed.  And if they don't want to, many versions of make support
the use "make -s" to suppress the echoing of commands.

Using @echo on a message sent to stdout or stderr is acceptable (since it
avoids showing the message twice).  Otherwise don't use "@" - it makes it
harder to track down problems in the makefiles.

Use of Assert
=============

Use Assert to perform internal consistency checks, and to check for invalid
arguments to functions and methods (e.g. passing a NULL pointer when this isn't
permitted).  It should *NOT* be used to check for error conditions such as
file read errors, memory allocation failing, etc (since we want to perform such
checks in non-debug builds too).

File format errors should also not be tested with Assert - we want to catch
a corrupted database or a malformed input file in a non-debug build too.

There are several variants of Assert:

- Assert(P) -- asserts that expression P is true.

- AssertRel(a,rel,b) -- asserts that (a rel b) is true - rel can be a boolean
  relational operator, i.e. one of ``==``, ``!=``, ``>``, ``>=``, ``<``,
  ``<=``.  The message given if the assertion fails reports the values of
  a and b, so ``AssertRel(a,<,b);`` is more helpful than ``Assert(a < b);``

- AssertEq(a,b) -- shorthand for AssertRel(a,==,b).

- AssertEqDouble(a,b) -- asserts a and b differ by less than DBL_EPSILON

- AssertParanoid(P) -- a particularly expensive assertion.  If you want a build
  with Asserts enabled, but without a great performance overhead, then
  passing --enable-assertions=partial to configure and AssertParanoids
  won't be checked, but Asserts will.  You can also use AssertRelParanoid
  and AssertEqParanoid.

- CompileTimeAssert(P) -- if P is a constant expression, CompileTimeAssert
  can be used to assert it is non-zero at compile-time - the P evaluates
  to zero, then the compilation will fail with an error.  CompileTimeAssert
  can only be used inside a function body.  There should be no runtime
  overhead for using CompileTimeAssert(), so CompileTimeAssert() is always
  enabled, regardless of whether --enable-assertions is passed to configure
  or not.

Marking Features as Deprecated
==============================

In the API headers, a feature (a class, method, function, enum, typedef, etc)
can be marked as deprecated by using the XAPIAN_DEPRECATED() macro.  Note that
you can't deprecate a preprocessor macro.

For compilers with a suitable mechanism (currently GCC 3.1 or later, and
MSVC 7.0 or later) this causes compile-time warning messages to be emitted for
any use of the deprecated feature.  For compilers without support, the macro
just expands to its argument.

You must add this line to any API header which uses XAPIAN_DEPRECATED():

    #include <xapian/deprecated.h>

When marking a feature as deprecated, document the deprecation in
docs/deprecation.rst.  When actually removing deprecated features, please tidy
up by removing the inclusion of <xapian/deprecated.h> from any file which no
longer marks any features as deprecated.

The XAPIAN_DEPRECATED() macro should wrap the whole declaration except for the
semicolon and any "definition" part, for example::

    XAPIAN_DEPRECATED(int old_function(double arg));

    class Foo {
      public:
        XAPIAN_DEPRECATED(int old_method());

        XAPIAN_DEPRECATED(int old_const_method() const);

        XAPIAN_DEPRECATED(static int old_static_method());

        XAPIAN_DEPRECATED(static const int OLD_CONSTANT) = 42;
    };

To avoid compilation errors with older GCC versions (noted with GCC 3.3.5),
you can't mark a method which is defined inline in a class with
XAPIAN_DEPRECATED (this works with recent GCC versions though)::
    
    class Foo {
      public:
        // This fails to compile with GCC 3.3.5, so don't do this!
        XAPIAN_DEPRECATED(int old_inline_method()) { return 42; }
    };
    
Instead rewrite like so::

    class Foo {
      public:
        XAPIAN_DEPRECATED(int old_inline_method());
    };

    inline int Foo::old_inline_method() { return 42; }

Submitting Patches:
===================

If you have a patch to fix a problem in Xapian, or to add a new feature,
please send it to us for inclusion.  Any major changes should be discussed
on the xapian-devel mailing list first:
<http://www.xapian.org/lists.php>

We find patches in unified diff format easiest to read.  If you're using a
SVN checkout just use "svn diff" to generate the diff.  If you're working
from a tarball, compare against the original versions of files using
"diff -puN" (-p reports the function name for each chunk).

Please set the width of a tab character in your editor to 8 spaces, and use
Unix line endings (i.e. LF, not CR+LF).  Failing to do so will make it much
harder for us to merge in your changes.

We don't currently have a formal coding standards document, but please try
to follow the style of the existing code.  In particular:

* Indent C++ code by 4 spaces for a new indentation level, and set your editor
  to tab-fill indentation.  As an exception, "public", "protected" and
  "private" declarations in classes and structs should be indented by 2 spaces,
  and the following code should be indented by 2 more spaces::

    class Foo {
      public:
        method();
    };

  The rationale for this exception is that class definitions in header files
  often have fairly long lines, so losing an indent level to the visibility
  specifier tends to make class definitions less readable.

  The default visibility for a class is always "private", so there's no need
  to specify that explicitly.  If a class only contains public methods and
  data, consider declaring it as a "struct" (the only difference in C++ is
  that the default visibility for a struct is "public").

* Put a space before the "(" after control flow constructs like "for", "if",
  "while", etc.  Don't put a space before the "(" in function calls.  So
  write "if (strlen(p) > 10)" not "if(strlen (p) > 10)".

* Prefer "++i;" to "i++;", "i += 1;", or "i = i + 1".  For simple integer
  variables these should generate equivalent (if not identical) code, but if i
  is an iterator object then the pre-increment form can be more efficient in
  some cases with some compilers.  It's simpler and more consistent to always
  use the pre-increment form (unless you make use of the old value which the
  post-increment form returns).  For the same reasons, prefer "--i;" to "i--;",
  "i -= 1;", or "i = i - 1;".

* Prefer "container.empty()" to "container.size() == 0" (and
  "!container.empty()" to "container.size() != 0" or "container.size() > 0").
  Finding the size of a container may not be a constant time operation for
  all containers (e.g. std::list may not be, and indeed isn't for GCC - see
  http://gcc.gnu.org/onlinedocs/libstdc++/23_containers/howto.html#6).  And
  the "empty()" form makes the intent of the test more explicit.

* Prefer not to use "else" when the control flow is diverted elsewhere at the
  end of the "if" block (e.g. by "return", "continue", "break").  This
  eliminates a level of indentation from the code in the "else" block, and
  typically makes the control flow logic clearer.  For example::

    if (x == 0) {
        foo();
        return;
    }

    while (x--) {
        bar();
    }

  rather than::

    if (x == 0) {
        foo();
        return;
    } else {
        while (x--) {
            bar();
        }
    }

* For standard ISO C headers, we now prefer the C++ form for ISO C headers
  (e.g. "#include <cstdlib>" rather than "#include <stdlib.h>") for new code (a
  *lot* of the existing code currently uses the old form, but there are a few
  semantic differences, so we're holding off on a wholesale conversion until
  1.1.0 to avoid potentially disruptive changes with no direct benefit to
  users).

* For standard ISO C++ headers, *always* use the ISO C++ form '#include <list>'
  (pre-ISO compilers used '#include <list.h>', but GCC has generated a warning
  for this form for years).

We will do our best to give credit where credit is due - if we have used
patches from you, or received helpful reports or advice, we will add your name
to the AUTHORS file (unless you specifically request us not to).  If you see we
have forgotten to do this, please draw it to our attention so that we can
address the omission.

Developers with SVN access:
===========================

People who are more seriously involved with the project are likely to
have write access to the SVN repository.  This section gives the conventions
for those developers, but most of these also apply if you're generating a
patch you'd like us to include.

1) Make sure that the documentation is updated
----------------------------------------------

 * API classes, methods, functions, and types must be documented by
   documentation comments alongside the declaration in ``include/xapian/*.h``.
   These are collated by doxygen - see doxygen's documentation for details
   of the supported syntax.

 * The documentation comments don't give users a good overview, so we also
   need documentation which gives a good overview of how to achieve particular
   tasks.

 * Internal classes, etc should also be documented by documentation comments
   where they are declared.

2) Make sure the tests are right
--------------------------------

 * If you're adding a feature, also add feature tests for it.  These both
   ensure that the feature isn't broken to start with and detect if later
   changes stop it working as intended.
 * If you've fixed a bug, make sure there's a regression test which
   fails on the existing code and succeeds after your changes.
 * Make sure all existing tests continue to pass.

If you don't know how to write tests using the Xapian test rig, then
ask.  It's reasonably simple once you've done it once.  There is a brief
introduction to the Xapian test system in ``docs/tests.html``.

3) Make sure the attributions are right
---------------------------------------

 * If necessary, modify the copyright statement at the top of any
   files you've altered. If there is no copyright statement, you may
   add one (there are a couple of Makefile.am's and similar that don't
   have copyright statements; anything that small doesn't really need
   one anyway, so it's a judgement call).  If you've added files, they
   should include the GPL boilerplate with your name only.
 * If you're not in there, add yourself to the AUTHORS file.

4) Create a ChangeLog entry and commit
--------------------------------------

 * Add an entry to the ChangeLog file at the top of the module.  The
   text of this can be identical to the SVN commit message.  The datestamps in
   our ChangeLog entries are as produced by the Unix date utility when invoked
   as::

     date "+%a %b %d %T %Z %Y"

 * Commit to the repository.

Then you can update any patch, bug or feature request items in Bugzilla
to indicate that they've been dealt with.

API Structure Notes
===================

We use reference counted pointers for most API classes.  These are implemented
using Xapian::Internal::RefCntPtr, the implementation of which is exposed for
efficiency, and because it's unlikely we'll need to change it frequently, if at
all.

For the reference counted classes, the API class (e.g. Xapian::Enquire) is
really just a wrapper around a reference counted pointer.  This points to an
internal class (e.g. Xapian::Enquire::Internal).  The reference counted
pointer is a member variable of the API class called internal.  Conceptually
this member is private, though it typically isn't declared as private (this
is to avoid littering the external headers with friend declarations for
non-API classes).

There are a few exceptions to the reference counted structure, such as
MSetIterator and ESetIterator which have an exposed implementation.  Tests show
this makes a substantial difference to speed (it's ~20% faster) in typical
cases of iterator use.

The postfix operator++ for iterators should be implemented inline in terms
of the prefix form as described by Joe Buck on the gcc mailing list
- excerpt from http://article.gmane.org/gmane.comp.gcc.devel:50201 ::

    class some_iterator {
    public:
	// ...
	some_iterator& operator++();

	some_iterator operator++(int) {
	    some_iterator tmp = *this;
	    operator++();
	    return tmp;
	}
    };

    The compiler is allowed to assume that the copy constructor only does
    a copy, and to optimize away unneeded copy operations.  The result
    in this case should be that, for some_iterator above, using the
    postfix operator without using the result should give code equivalent
    to using the prefix operator.

    Now, for [GCC 3.4], you'll find that the dead uses of tmp are only
    completely optimized away if tmp has only one data member that can fit in a
    register.  [GCC 4.0 will do] better, and you should find that this style
    comes very close to eliminating any penalty from "incorrect" use of the
    postfix form.

Xapian's PostingIterator, TermIterator, and PositionIterator all have only one
data member which fits in a register.

Handy tips for aiding development
=================================

If you are find you are repeatedly changing the API headers (in include/)
during development, then you may become annoyed that the docs/ subdirectory
will rebuild the doxygen documentation every time you run "make" since this
takes a while.  You can disable this temporarily (if you're using GNU make),
by creating a file "docs/GNUmakefile" containing these two lines:

%:
	@echo "Skipping 'make $@' in docs"

Note that the whitespace at the start of the second line needs to be a
single "tab" character!

Don't forget to remove (or rename) this and check the documentation builds
before committing or generating a patch though!

How to make a release
=====================

This is a (hopefully complete) list of the jobs which need doing:

* Email Fabrice Colin and Tim Brody so they can check RPM packaging.

* Check the revision currently specified in the svn:externals property of
  xapian-applications/omega.  Unless there's a good reason, we should release
  xapian-core and omega with synchronised versions of the shared files.

* Make sure that any new/changed/removed API methods in xapian-core have been
  wrapped/updated/removed in xapian-bindings.

* Update the lists of deprecated/removed API methods in docs/deprecation.rst

* Update the NEWS files using information from the ChangeLog files

* Update the PLATFORMS file.  Don't forget to use reports from the tinderbox:
  http://www.oligarchy.co.uk/tinderbox/xapian/status.html

* Update the version in configure.ac for each module (xapian-core, omega, and
  xapian-bindings), and the library version info in xapian-core's configure.ac

* Move any bugs fixed by this release from "RESOLVED FIXED" -> "CLOSED"
  http://www.xapian.org/cgi-bin/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED
  Make sure the submitters are mentioned in the "thanks" list in AUTHORS.

* On ixion, svn tag the source trees for the new revision - use the
  svn-tag-release script, running it with the new version number, for example:

  xapian-maintainer-tools/svn-tag-release 0.9.0

  This script also generates tarballs for the new release and copies them
  across to the website.

* Add the new version to the list of versions in Bugzilla:
  http://www.xapian.org/cgi-bin/bugzilla/editversions.cgi?product=Xapian&action=add

* Update the 1.0.N tracker bug to track 1.0.N+1:
  http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=200

* Update the website: version.php in the CVS module www.xapian.org contains the
  latest version and the date it was released.

* Run /u1/olly/xapian-website-update/update_website.sh

* Update the wiki:

  Create a new page http://wiki.xapian.org/ReleaseNotes/X.Y.Z and link it into
  http://wiki.xapian.org/ReleaseNotes in place of the old current release link,
  which should be moved to the archived section.  Also update
  http://wiki.xapian.org/RoadMap to record the date of the release and the
  estimated date for the next release.

  Also update the roadmap at http://wiki.xapian.org/RoadMap by recording the
  date of this release and adding an entry for the next release with an
  estimated release date.

* Update the freshmeat entry at:
  http://freshmeat.net/add-release/40427/43070/

* Announce the new version on xapian-discuss

* Have a nice cup of tea!

How to make Debian packages for a new release
=============================================

Debian control files are stored in the "debian" subdirectory of each module
for which packages have been produced (currently xapian-core, xapian-bindings
and xapian-applications/omega).  After each release, these should be
updated as follows:

* Update the debian/changelog file, being sure to keep it in the
  standard Debian format (the easiest way is to use the dch utility
  like so: "dch -v 0.9.7-1".  The new version number should be the
  version number of the release followed by "-1" (ie, a debian
  patch number of 1).  The changelog message should indicate that
  there is a new upstream release, and should mention any significant
  changes in the new release.

* If any patches are being applied when building the debian package
  (ie, there is a patch file "debian/patch"), and these patches are
  now incorporated into the release, remove or update the patch file.

* Use xapian-maintainer-tools/debian/svn-tag-debs to tag all the files in the
  debian control directory with the tag "debian-VERSION-1" - eg, for a new
  release of version 0.9.6, tag with "debian-0.9.6-1".

* Use xapian-maintainer-tools/debian/make-source-packages to make and upload
  new source packages for the various Debian and Ubuntu versions we provide
  packages for.  Currently this script must be run on ixion.

* Build debs for oldstable, stable, unstable, dapper, edgy, feisty, and gutsy.
  The scripts xapian-maintainer-tools/debian/create-chroot and
  xapian-maintainer-tools/debian/build-packages allow building these in
  a series of chroots on a single machine using pbuilder.



.. vim: syntax=
