NAME
    tv_extractinfo_en - read English-language listings and extract info from
    programme descriptions.

SYNOPSIS
    tv_extractinfo_en [--help] [--output FILE] [FILE...]

DESCRIPTION
    Read XMLTV data and attempt to extract information from English-language
    programme descriptions, putting it into machine-readable form. For
    example the human-readable text '(repeat)' in a programme description
    might be replaced by the XML element <previously-shown>.

    --output FILE write to FILE rather than standard output

    This tool also attempts to split multipart programmes into their
    constituents, by looking for a description that seems to contain lots of
    times and titles. But this depends on the description following one
    particular style and is useful only for some listings sources (Ananova).

    If some text is marked with the 'lang' attribute as being some language
    other than English ('en'), it is ignored.

SEE ALSO
    xmltv(5).

AUTHOR
    Ed Avis, ed@membled.com

BUGS
    Trying to parse human-readable text is always error-prone, more so with
    the simple regexp-based approach used here. But because TV listing
    descriptions usually conform to one of a few set styles,
    tv_extractinfo_en does reasonably well. It is fairly conservative,
    trying to avoid false positives (extracting 'information' which isn't
    really there) even though this means some false negatives (failing to
    extract information and leaving it in the human-readable text).

    However, the leftover bits of text after extracting information may not
    form a meaningful English sentence, or the punctuation may be wrong.

    On the two listings sources currently supported by the XMLTV package,
    this program does a reasonably good job. But it has not been tested with
    every source of anglophone TV listings.

