Jericho HTML Parser
Release Notes

2.3   (2006-09-11)
       - Bug Fixes:
         - [1510438] NullPointerException in Source.indent.
         - [1511480] Incorrect detection of non-html element with nested
           empty-element tag of same name.
         - [1547562] Fault in caching mechanism.
         - Source.fullSequentialParse() sometimes resulted in unregistered
           tags being returned in tag searches.
         - Invalid Empty-element tags whose name is in either of the sets
           HTMLElements.getEndTagOptionalElementNames() or
           HTMLElements.getEndTagRequiredElementNames() were rejected by the
           parser if the slash immediately follows the tag name.
         - StartTag.tidy() only included a slash before the closing delimiter
           of the tag if the tag name was in the set of
           HTMLElements.getEndTagForbiddenElementNames().  It now includes the
           slash for all tag names not in getEndTagOptionalElementNames().
       - Source.fullSequentialParse() now clears the cache automatically
         instead of throwing an IllegalStateException if the cache is not
         empty.
       - Changes to behaviour of Source.indent:
         - preserves indenting in SCRIPT elements, server elements,
           HTML comments and CDATA sections.
         - keeps SCRIPT elements, HTML comments, XML declarations,
           XML processing instructions and markup declarations inline.
       - Minor documentation improvements.

2.2   (2006-06-20)
       - Bug Fixes:
         - Fault in caching mechanism resulted in missed tags in rare
           circumstances. (SubCache.findNextTag method)
         - [1407179] Segment.extractText() threw NullPointerException if
           the last character position was part of a tag.
       - Segment.extractText() now converts some tags to whitespace and
         ignores text inside SCRIPT and STYLE elements.
       - Added Segment.extractText(boolean includeAttributes) option.
       - Added Source.fullSequentialParse() method.
       - Added CharStreamSource interface for dealing with char output.
       - Added Source.indent(String indentText, boolean tidyTags,
          boolean collapseWhiteSpace, boolean indentAllElements) method.
       - Added Segment.getChildElements() method.
       - Added Element.getParentElement() method.
       - Added Element.getDepth() method.
       - Named tag search methods now only return unregistered tags if the
         specified name is not a valid XML tag name.
       - Changed Attributes.DefaultMaxErrorCount system default from 1 to 2.
       - Added EndTag.getElement() method.
       - Added Tag.getElement() abstract method.
       - Added Tag.getNameSegment() method.
       - Added Tag.getUserData() and Tag.setUserData(Object) methods.
       - Added Tag.findNextTag() method.
       - Added Tag.findPreviousTag() method.
       - Added Tag.tidy() and Tag.tidy(boolean toXHTML) methods.
       - Added and renamed many methods in OutputDocument class to make the
         interface more intuitive.
       - Added HTMLElements.getNestingForbiddenElementNames() method.
       - Illegally nested elements with required end tags now terminate at
         start of illegally nested start tag, avoiding possible stack overflow
         in the common case of multiple unterminated <a name=...> elements.
       - Tag search methods called with a pos argument that is out of range
         now return null or empty results rather than throwing an exception.
       - Renamed output(Writer) method in OutputSegment to writeTo(Writer).
       - Deprecated Tag.regenerateHTML() method.
       - Deprecated Source.getNextTagIterator() method.
       - Deprecated AttributesOutputSegment class.
       - Deprecated StringOutputSegment class.
       - Removed BlankOutputSegment class from public API.
       - Removed CharOutputSegment class from public API.
       - Removed IOutputSegment which was deprecated in 2.0.

2.1   (2005-12-24)
       - Added Source(InputStream) constructor.
       - Added Source(Reader) constructor.
       - Added Source(URL) constructor.
       - Added Source.getEncoding() method.
       - Added Source.getEncodingSpecificationInfo() method.
       - Added Source.isXML() method.
       - Added Source.findNextElement(pos) method.
       - Added Source.findNextElement(pos,name) method.
       - Added Segment.extractText() method.
       - Added StartTag.getAttributeValue(attributeName) method.
       - Added Element.getAttributeValue(attributeName) method.
       - Added ExtractText and SourceEncoding sample programs.

2.0   (2005-11-10)
       - Complete rewrite of the parsing engine to allow the encapsulation of
         different tag types into the new TagType class.
       - Requires Java 1.4 or later.
       - All programs written for previous versions of the library will have
         to be recompiled with the new version, regardless of whether any
         changes are required.  This is because several methods, including the
         Source constructor, now expect a CharSequence as an argument instead
         of a String.
       - Changes that could require modifications to existing programs:
         - The toString() method of Segment and all subclasses now returns the
           source text of the segment instead of a string useful for debugging
           purposes.  This change was necessary because Segment now
           implements CharSequence.
         - For consistency, the toString() methods of all IOutputSegment
           implementations now return the output string instead of a string
           useful for debugging purposes.
         - The return type of the OutputDocument.getSourceText() method is now
           CharSequence instead of String.
         - Character references in Attribute.getValue() are now decoded
         - StartTag.isEmptyElementTag() no longer checks whether the end tag
           is required.
         - Element.getContent() now returns zero-length segment instead of null
           in case of an empty element.
         - FormField.getPredefinedValues() now returns an empty collection
           instead of null if the form field has no predefined values.
         - Segment.findAllStartTags() now returns server tags that are found
           inside other tags.
         - Attributes segment now ends immediately after the last attribute
           instead of immediatley before the end-of-tag delimiter.
         - Modified Segment.isWhiteSpace(char) to match HTML specification
         - CharacterReference.encode(CharSequence) no longer encodes
           apostrophes by default
         - Tags of type SERVER_COMMON now always have the name "%" regardless
           of whether an identifier immediately follows it.
         - Modified and enhanced aspects of StartTag searches relating to
           special tags
         - P elements are now terminated by TABLE elements.
           See the HTMLElementName.P documentation for more information.
       - removed public fields in Attribute class that were deprecated in 1.2
       - removed Source.getSourceTextLowerCase() method deprecated in 1.3
       - removed Source.findEnd(int pos, SpecialTag) method which was
         accidentally added as a public method in 1.4
       - Deprecated numerous methods (details in javadoc)
       - Deprecated IOutputSegment interface and replaced with OutputSegment
       - Improved caching system
       - Added recognition of markup declarations
       - Added recognition of CDATA sections
       - Added recognition of SGML marked sections
       - Doctype declarations containing markup declarations now supported
       - Segment class now implements CharSequence and Comparable
       - Added getDebugInfo() to Segment and all subclasses to replace the
         previous functionality of the toString() method
       - OutputSegment interface now implements CharSequence
       - Added getDebugInfo() to the OutputSegment interface to replace the
         previous functionality of the toString() method
       - Attributes class now implements List
       - FormFields class now implements Collection
       - Added HTMLElementName interface and HTMLElements class
       - Added RowColumnVector class and associated methods in Source class
       - Added FormControl class
       - Added various methods to the FormField, FormFields and OutputDocument
         classes related to FormControl objects and the manipulation and output
         of form submission values.
       - Added Config and related classes
       - Added TagType class and subclasses
       - Added various tag search methods to the Source and Segment classes
         including searches by TagType, attribute values, and other criteria.
       - Added AttributesOutputSegment class
       - Added Util class
       - Added OverlappingOutputSegmentsException class
       - Added many other methods to existing classes
       - Documentation improvements

1.4.1 (2005-11-10)
       - Bug Fixes:
         - [1065861] Named StartTag search did not find a tag immediately
           following a comment
         - Unnamed StartTag search did not find a comment if the search starts
           at the first character of the comment
         - Character references in FormField.getPredefinedValues() items were
           not decoded
         - FormControlType.SELECT_SINGLE.allowsMultipleValues() returned false
           instead of the correct value of true, resulting in the same
           incorrect value from FormField.allowMultipleValues() when multiple
           SELECT_SINGLE controls with the same name were present in the form

1.4   (2004-09-02)
       - Added CharacterEntityReference and NumbericCharacterReference classes
       - Added CharOutputSegment class
       - Attributes allow whitespace around '=' sign
       - Added convenience method Element.getAttributes()
       - Some documentation improvements

1.3   (2004-07-25)
       - Deprecated Source.getSourceTextLowerCase()
       - Added ignoreWhenParsing methods to Source and Segment classes
         (See sample called JSPTest)
       - Added parseAttributes methods to Source, Segment and StartTag classes
       - Added ability to search for tags in a specified namespace
       - Added BlankOutputSegment class
       - Fixed bug relating to HTML comments with alphabetic characters
         immediately following the opening <!-- characters

1.2   (2004-06-16)
       - Deprecated public fields in Attribute class in favour of accessor
         methods
       - Following methods return empty list instead of null if no result:
         (WARNING - This could possibly break existing programs)
          Segment.findAllStartTags(String name)
          Segment.findAllComments()
          Segment.findAllElements(String name)
          Segment.findAllElements()
       - Added hashCode() method to Segment class
       - Server tags such as ASP, JSP, PSP, PHP and Mason are now recognised
       - Basic parser logging introduced (see Source.setLogWriter() method)
       - Start tags with too many badly formed attributes rejected
         (reduces number of false positives when searching for start tags)
       - Added public IOutputSegment.COMPARATOR field
       - Improved caching

1.1   (2004-03-07)
       - All elements defined in HTML 4.01 are recognised and their properties
         used to aid analysis
       - StartTag.getElement() method enhanced to return the correct span of
         elements which have a missing optional end tag
       - StartTag.isEndTagForbidden() method enhanced to also check the name of
         the tag against the list of elements in the HTML spec whose end tags
         are forbidden
       - Numerous new methods
       - Huge performance enhancement from the use of internal caching
       - Bug Fixes:
         [909944] Parser does not work with unclosed comments.

1.0   (2004-02-07) Initial Release

