                          OpenToken Package Readme

                                Version 3.0b

The OpenToken package is a facility for performing token analysis and
parsing within the Ada language. It is designed to provide all the
functionality of a traditional lexical analyzer/parser generator, such as
lex/yacc. But due to the magic of inheritance and runtime polymorphism it
is implemented entirely in Ada as withed-in code. No precompilation step is
required, and no messy tool-generated source code is created.

Additionally, the technique of using classes of recognizers promises to
make most token specifications as simple as making an easy to read
procedure call. The most error prone part of generating analyzers, the
token pattern matching, has been taken from the typical user's hands and
placed into reusable classes. Over time I hope to see the addition of
enough reusable recognizer classes that very few users will ever need to
write a custom one. Parse tokens themselves also use this technique, so
they ought to be just as reusable in principle, athough there currently
aren't a lot of predefined parse tokens included in OpenToken.

Ada's type safety features should also make misbehaving analyzers and
parsers easier to debug. All this will hopefully add up to token analyzers
and parsers that are much simpler and faster to create, easier to get
working properly, and easier to understand.

History

Version 3.0b

This version contains another code reorganization to go with another new
parsing facility. This time it is recursive decent parsing. The new method
has the following advantages over table-driven parsers:

   * Its simpler to implement.
   * Its provides many more opportunities for reuse.
   * Its parsers are debugable.
   * There's no expensive parser-generation phase.

The disadvantages are:

   * Its parsers are most likely a bit slower.

Given the above balance, I do intend to make this the standard supported
parsing facility for future versions of OpenToken. The "b" designation is
there to indicate that some things might not be in quite their permanent
form yet, and that there isn't yet the full set of reusable tokens to
support it that I would like to see in a release. I'm hoping for feedback
both in the form of criticisms/suggestions, and reusable tokens in order to
help finalize this facility.

A general list of the changes is below:

   * Renamed OpenToken.Token tree to OpenToken.Token.Enumerated.
   * Created a new (non-enumerated) base token type and base analyzer type
     in OpenToken.Token.
   * Made a Parse routine and a Could_Parse_To routine primitives of the
     base token type.
   * Created the following predefined nonterminal tokens (both as straight
     types, and as mixins).
        o List
        o Selection
        o Sequence
   * Fixed a bug in the bracketed comment recognizer.
   * Implemented a (hopefully temporary) work-around for a bug in Gnat
     version 3.13p.
   * Fixed a bug in the string recognizer where it was mishandling octal
     and hex escape sequences.
   * Changed the analyzer and the text feeders to support analyzing binary
     files.
   * The HTML lexer has been improved to be a bit faster and more flexible.

Version 2.0

This is the first version to include parsing capability. The existing
packages underwent a major reorganization to accommodate the new
functionality. As some of the restructuring that was done is incompatible
with old code, the major revision has been bumped up to 2. A partial list
of changes is below:

   * Renamed the top level of the hierarchy from Token to OpenToken.
   * Moved the analyzer underneath the new OpenToken.Token hierarchy.
   * Renamed the Token recognizers from Token.* to OpenToken.Recognizer.*
   * Changed the text feeder procedure pointer into a text feeder object.
     This will allow full re-entrancy in analyzers that was thwarted by
     those global text feeders previously.
   * Updated the SLOC counter to read a list of files to process from a
     file. It also handles files with errors in them a bit better.
   * Added lalr(1) parsing capability and numerous packages to support it.
     A structure is in place to build other parsers as well.
   * Created a package hierarchy to support parse tokens. The word "Token"
     in OpenToken now refers to objects of this type, rather than to token
     recognizers.
   * An HTML lexer has been added to the language lexers
   * .Recognizer.Bracketed_Comment now works properly with single-character
     terminators.

Version 1.3.6

This version fixes a rare bug in the Ada style based numeric recognizers.
The SLOC counter can now successfully count all the source files in Gnat's
adainclude directory.

Version 1.3.5

This version adds a simple Ada SLOC counting program into the examples. A
bug with the Real token recognizer that caused constraint_errors has been
fixed. Also bugs causing constraint errors in the ada-style based integer
and real recognizers on long non-based numbers have been fixed.

Version 1.3

This version adds the default token capability to the Analyzer package.
This allows a more flexible (if somewhat inefficient) means of error
handling to the analyzer. The default token can be used as an error token,
or it can be made into a non-reportable token to ignore unknown elements
entirely.

Identifier tokens were generalized a bit to allow user-defined character
sets for the first and subsequent characters. This not only gives it the
ability to handle syntaxes that don't exacly match Ada's, but it allows one
to define identifiers for languages that aren't latin-1 based. Also, the
ability to turn off non-repeatable underscores was added.

Integer and Real tokens had an option added to support signed literals.
This option is set on by default (which causes a minor backward
incompatibility). Syntaxes that have addition or subtraction operators will
need to turn this option off.

A test to verify proper handling of default parameters was added to the
Test directory. A makefile was also added to the same directory to
facilitate automatic compiling and running of the tests. This makefile will
not work in a non-Gnat/NT environment without some modification.

New recognizers were added for enclosed comments (eg: C's /* */
comments)and  single character escape sequences. Also a "null" recognizer
was added for use as a default token.


Version 1.2.1

This version adds the CSV field token recognizer that was inadvertently
left out of 1.2. This recognizer was designed to match fields in
comma-separated value (CSV) files, which is a somewhat standard file format
for databases and spreadsheets. Also, the extraneous CVS directories in the
zip version of the distribution were removed.

Version 1.2

The long-awaited string recognizer has been added. It is capable of
recognizing both C and Ada-style strings. In addition, there are a great
many submissions by Christoph Grein in this release. He contributed mostly
complete lexical analyzers for both Java and Ada, along with all the extra
token recognizers he needed to accomplish this feat. He didn't need as many
extra recognizers as I would have thought he'd need. But even so, slightly
less than 1/2 of the recognizers in this release were contributed by Chris
(with a broken arm, no less!)

Version 1.1

The main code change to this version is a default text feeder function that
has been added to the analyzer. It reads its input from
Ada.Text_IO.Current_Input, so you can change the file to whatever you want
fairly easily. The capability to create and use your own feeder function
still exists, but it should not be necessary in most cases. If you already
have code that does this, it should still compile and work properly.

The other addition is the first version of the OpenToken user's guide. All
it contains right now is a user manual walking through the steps needed to
make a simple token analyzer. Feedback and/or ideas on this are welcome.

Version 1.0

This is the very first publicly released version. This package is based on
work I did while working on the JPATS trainer for FlightSafety
International. The germ of this idea came while I was trying to port a
fairly ambitious, but fatally buggy Ada 83 token recognition package
written for a previous simulator. But once I was done, I was rather
suprised at the flexibility of the final product. Seeing the possible
benefit to the community, and to the company through user-submitted
enhancement and debugging, I suggested that this code be released as Open
Source. They were open-minded enough to agree. Bravo!


Future

As it stands, I am developing and maintaining this package as part of my
master's thesis. Thus you can count on a certain amount of progress in the
next few months

You may notice that most of the stuff I had marked for last release has
been delayed or thrown out. So of course plans do change. :-) But with that
caveat...

Things on my plate for the next release:

   * Better support for error reporting and handling
   * A Reference Manual describing all the routines in all the packages
   * A reference manual generator (with which the above will be created)

Things you can help with:

   * More recognizers - The more of these there are, the more useful this
     facility is. If you make 'em, please send 'em in!
   * Generally usable Tokens - I'm not sure there are as many reusable
     tokens out there as there are reusable recognizers, but I await
     pleasant surprises.
   * Useful token support packages. One example would be a generic symbol
     table creation/lookup package.
   * Well isolated bug reports (or even fixes). Version 3.0 has quite a few
     changes, as did 2.0. So bugs are much more likely in this version than
     they have been in the past.

Again, I hope you find this package useful for your needs.

T.E.D.  - dennison@telepath.com
