- Changes for 0.9

INCOMPATIBLE CHANGE: ParseRecord and HeaderFooter now take an attrs
dictionary, for use with startElement.  (This should have been done
last summer with the changes for Group to take an attr.)  There was no
easy and elegant way to make a backwards compatible solution, but I
did make it give a TypeError if you pass in what would be the proper
term for the old API.  To make old code work, use {} for attrs.

Fast path optimizations in iteration (for 'characters') - 10%
performance boost for my test case.

There is now a default iterator boundary tag: 'record'

It's possible for an expression to go to completion but allow some
text to remain unparsed.  This now throws a new exception (a subtype
of the old one) to allow the handlers to do something different for
that case.  This is used for the Bioformat format recognition code.

Martel.SimpleRecordFilter is used by the Bioformat code to write a
quick test filter, to determine if more identification work should be
done.

Added Martel.NoCase to produce a case insensitive version of the given
expression, as in
  >>> import Martel
  >>> print str(Martel.NoCase(Martel.Str("JAN")))
  [Jj][Aa][Nn]
  >>>

Added Martel.Debug to print a message if the matching reaches that
part of the pattern.

The IterParser class handles iteration of records when on the
record boundary.  Uses 'yield' so only works with 2.2.  The older
Iterator.py classes still work for iteration support on pre-2.2
installations.

The Dispatch class help with "well-formed" ContentHandler calls by
mapping elements like <spam> into 'start_spam'.  The Parser class
contains special-case performance code to handle Dispatch calls.

New function 'replace_groups' takes an expression and a list of
replacement (old_tag, new_expression).  It replaces the contents of
the matching group with the new expression.

  >>> exp = Martel.Group("spam", Martel.Str("viking"))
  >>> exp = Martel.replace_groups(exp, [("spam", Martel.Str("rabbit"))])
  >>> from xml.sax import saxutils
  >>> p = exp.make_parser()
  >>> p.setContentHandler(saxutils.XMLGenerator())
  >>> p.parseString("rabbit")
  <spam>rabbit</spam>
  >>>
	
Added:
  Martel.UntilSep = read up to a seperator character, don't consume it
  Martel.UntilEol = read up to a newline character, don't consume it
  Time.make_expression("%(MM)") parses '01' .. '12' (must have two digits)
  Time.make_expression("%(DD)") parses '01' .. '31' (must have two digits)
  Chaged ToSep and DelimitedFields to take a 'sep' instead of a 'delimiter'


- Changes for 0.8

Added new, standard definitions
  Martel.Punctuation = Any(string.punctuation)
  Martel.Unprintable = AnyBut(string.printable)
  Martel.Word = \w+
  Martel.Spaces = [\t\v\f ]+  (whitespace expect newline characters)
  Martel.ToSep = read up to a seperator character
  Martel.DelimitedFields = read field sepearted characters, up to a \R

Renamed
  Martel.Integer to Martel.Digits
  Martel.SignedInteger to Martel.Integer

Both the additions and the renames take an optional name and
attributes, which are used for a Group around the term.

Added a new type of Expression -- NullOp.  This simplified
the implementation of Time.py
 
New submodule "Time.py" for building patterns and/or expressions for
parsing date/times.
 
Added "LAX" as a new way to handle "simple" XML records.

XX ContentHandler Factory
 
Bug fixed! - someone in personal email pointed out the named
group backreferences ("(?P=name)" construct) weren't working.
Turned out I didn't even have a regression test for that
case.  Both problems now fixed.

Bug fixed! - Brad pointed out the debug code didn't trim to the
min/max sizes of the string, so negative indexing sometimes caused
large and useless output.

Regression tests added for all the new code.
 
Some cleanup here and there.
 
- Changes during CVS

Group attributes:
  Group takes an optional 'attrs' object
  (?P<name?key=value)

rename 'sre_*.py' to 'msre_*.py'

Iterators
  - make iterator




- Changes between 0.5 and 0.4

Bug fix where HeaderFooter and ParseRecords weren't copying their
subexpressions when making a .copy()

Added SignedInteger and Float definitions.

Fixed some problems with the error reporting. (Status message,
location offset, rewrote HeaderFooterParser.)

Replaced \n with \R in the various format definitions.

Added a lot of formats, although most are incomplete in that
additional fields could be parsed.

Fixed swissprot parsing so non-existant fields don't generate empty
tags.

- Changes between version 0.35 and 0.4

Allow Unix, Mac and DOS newline conventions in a file.  Can even be
mixed in the same file (which does happen in real life).

RecordReaders use mxTextTools to find the record begin & end
locations.  Gives about a 50% performance boost because it doesn't
need to split and rejoin the lines and because it can use Boyer-Moore.

RecordReader's constructor and 'remainder' method use/return a
lookahead buffer as a string rather than a list of lines.


- Changes between version 0.3 and 0.35

Migrated to Python 2.0 and its xml package.  No longer runs under
older (1.x) Pythons.

Added more RecordReaders (Until, CountLines, Nothing, Everything).

Changed the RecordReader protocol to seed the line buffer (in the
constructor) and to get the final state for the input file and line
buffer (using remainder()).  Needed to allow chaining of different
reader types as with headers and footers.

Added a HeaderFooter Parser for formats like Prosite and PIR which
have a header and/or a footer with records in between.

Renamed the StateTable exception to Parser exceptions and removed the
EOF exception.

Experimental Iterator support ("make_iterator") as an alternate for
the pure SAX callback method.

Improved error reporting.  make_parser and make_iterator takes an
optional "debug_level".  Better error location is available with
debug_level == 1 and if it == 2, print current match information to
stdout.  Warning: debug_level == 1 is about 11 times slower than
debug_level == 0, which is why it is off by default.

Support for both the 1.1 and 1.2 mxTextTools.

- Changes between version 0.25 and 0.3

Added documentation on the internals and on how to write a parser.

Renamed and moved Generate.StateTable to Parser.Parser

Renamed the various "ContentHandler" to "DocumentHandler."
ContentHandler was flat out the wrong method name for SAX.

The parser and exceptions now inherit from the xml.sax.saxlib classes.

To parse a string, use the "parseString" method.  The old "parse"
method now takes a system identifier string.  A system identifier is
the SAX way of saying URL.  (Note: this will change again with Python
2.0 and the InputSource class.)

The "generate_*" commands now manipulate lists directly instead of
passing around 'Parser' objects.

Added parsers which can read a record at a time (ParseRecord and the
RecordReader classes.)

Added the optimize module, which does some limited regexp expression
cleanups and optimizations.  Haven't tested the performance
differences yet.

Started a Prosite 16.0 parser.

- Changes between 0.2 to 0.25

Consistent naming schemes to distinguish between a regexp written as a
string (a "pattern"), a parse tree (an "expression") or an mxTextTools
table (a "tagtable").
  
The "Subpattern" Node was renamed to "Group" for naming consistency
with Plex.  "Any" was renamed "Dot".  "In" was renamed "Any.

Fixed several bugs when translating from an expression tree back to a
pattern string.

Added docstrings and comments.

Added type check for the external Plex-like functions, since I was
getting annoyed that the error for doing 'Opt("text")' instead of
'Opt(Str("text"))' occured during tagtable generation and was hard to
track down.

Moved self test code from the modules into the test/ directory.

Changed the regression code to raise an Assertion error when there was
a problem rather than just printing the error and continueing.
