#!/usr/bin/perl

# $Id: tv_grab_br_net,v 1.19 2008/01/30 21:48:40 b4max Exp $

=pod

=head1 NAME

tv_grab_br_net - Grab TV listings for Brazil's NET Cable service (from nettv.globo.com).

=head1 SYNOPSIS

tv_grab_br_net --help

tv_grab_br_net --configure [--config-file FILE]  [--gui OPTION]

tv_grab_br_net [--config-file FILE] [--output FILE] [--gui OPTION]
               [--days N] [--offset N] [--quiet]
               [--relax N] [--slow] [--fast]

tv_grab_br_net --list-channels [--config-file FILE]
               [--output FILE] [--quiet]


=head1 DESCRIPTION

Output TV listings for several channels available in Brazil from the widely present NET cable provider. The data comes from nettv.globo.com which is their web page containing program information. The grabber relies on parsing HTML so it might stop working at any time.

First run B<tv_grab_br_net --configure> to choose your city, your lineup and which channels you want to download. Should your city not be listed even though NET has it on their homepage...sorry, they are using 2 different formats. Only one is implemented here. Please see B<KNOWN PROBLEMS>.

Then running B<tv_grab_br_net> with no arguments will output listings in XML format to standard output.

=head1 OPTIONS

B<--configure> Ask for configuration-data, which channels to download and write the configuration file.

B<--config-file FILE> Set the name of the configuration file, the default is B<~/.xmltv/tv_grab_br_net.conf>. This is the file written by B<--configure> and read when grabbing.

B<--gui OPTION> Use this option to enable a graphical interface to be used. OPTION may be 'Tk', or left blank for the best available choice. Additional allowed values of OPTION are 'Term' for normal terminal output (default) and 'TermNoProgressBar' to disable the use of Term::ProgressBar.

B<--output FILE> Write to FILE rather than standard output.

B<--days N> Grab N days. The default is five.

B<--offset N> Start N days in the future. The default is to start from today (=zero). Set to -1 to grab data beginning yesterday.

B<--quiet> Suppress the progress messages normally written to standard error.

B<--list-channels> Write output giving <channel> elements for every channel available (using the city and selection from the config file), but no programs. All channels are listed. Even those without any information about broadcasts.

B<--slow> Overrides the default set by the configuration. It enables the long strategy run: There is only basic information listed on the main channel page. However there is a lot more provided on subsequent pages for particular broadcasts. If you'd like to parse the data from these pages as well, set it in the configuration or supply this flag. But consider that the grab process takes much much longer when doing so, since many more web pages have to be retrieved. Please note that this option is a non-standard extension to the XMLTV-options and should be specified at configuration-time.

B<--fast> Overrides the default set by the configuration. It switches back to fast operation if the configuration specified slow. Please note that this option is a non-standard extension to the XMLTV-options and should be specified at configuration-time.

B<--relax N> May be used if time is not an issue, bandwidth is limited or you don't want to pound the NET Website (good idea). It adds a sleep after fetching each page. The number of seconds paused is random and between 1 and N. Since this also is a non-standard extension, you should consider setting it in the configuration. The commandline-option is only meant to override the config.

B<--version> Show the version of the grabber.

B<--help> Print a help message and exit.

=head1 TODO

=over

=item - translate user-interface and documentation to Portuguese

=item - use existing XML and gather more details by using the URL of the broadcasts

=item - logos/images (impossible because not implemented by the NET website)

=item - further preparation for new features in XMLTV 0.6

=back

=head1 KNOWN PROBLEMS

For some channels it seems NET has no data available whatsoever. When grabbing data for "today" or "tomorrow" these channels will automatically be commented out in the used configuration-file. This should not disturb any functionality unless you're keen for "empty" channels.

The NET website is sometimes inconsistent and buggy. And while the grabber might work perfect for some cities, it might not even be configurable for others. This is due to the bundles/selections they offer for the particular cities. It looks like for some cities the site works completely different :-(

NET currently seems to have a problem with the encoding of their data. While their site is coded in ISO-8859-1 their data looks like it has been corrupted by a conversion from UTF-8 (characters replaced with questionmarks). Since this happens at their site there is no way we can work around that, sorry :-(

=head1 SEE ALSO

L<xmltv(5)>.

=head1 AUTHOR

Max Becker, maxbecker -at- rocketmail -dot- com. Based on various other tv_grabbers from the XMLTV-project. Initial inspiration by Cristiano Garcia.

=head1 BUGS

If you happen to find a bug, please send me an email or mail to one of the XMLTV mailing lists, see web pages at I<http://sourceforge.net/projects/xmltv/>.

=head1 COPYRIGHT

Copyright  2005-2006 Max Becker.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

=cut


use strict;
use WWW::Mechanize;
require HTTP::Cookies;
use Date::Manip;

use XMLTV::Ask;
use XMLTV::Configure::Writer;
use XMLTV::Options qw(ParseOptions);

my constant $grabbername = 'tv_grab_br_net';

# definitions for the NET website
my constant $domain = 'nettv.globo.com';
my constant $base = "http://$domain";
my constant $pname_sel = 'selecaoid';
my constant $pname_ch = 'canalid';
my constant $pname_city = 'cidade_id';
my constant $pname_form = 'Cidade_form';
my constant $retry_max = 3;
my constant $retry_sleep = 10;
my constant $lang = 'pt_BR';
my constant $maxrating = 5; # maximum star-rating

my $rev = q$Revision: 1.19 $; # set from CVS-revision
$rev =~ s|^.*?(\d+(\.\d+)+).*?$|sprintf("%d." . "%s" x9, map sprintf("%03d", $_), split /\./, $1)|e;

# XML-encoding
my constant $enc = 'ISO-8859-1';

# init timezone
Date_Init("TZ=BRT");

# global variables
my ($channels, $bar, $writer, $slow, $fast, $relax);

my ($opt, $conf) = ParseOptions( {
    grabber_name => $grabbername,
    capabilities => [qw(baseline manualconfig tkconfig apiconfig)],
    stage_sub => \&config_stage,
    listchannels_sub => \&list_channels,
    version => '$Id: tv_grab_br_net,v 1.19 2008/01/30 21:48:40 b4max Exp $',
    description => "Brazil (NET)",
    extra_options => [ "slow"    => \$slow,
		       "fast"    => \$fast,
		       "relax=n" => \$relax ],
} );

# fill $channels with the data from the configuration
&fill_ch();

# grab some data
&grab();


# display the list of available channels for the current setup
sub list_channels {
    ($conf, $opt) = @_;

    my $mech = &init();
    my $chs = &channellist($mech);

    my $xml = '';
    $writer = &build_writer(\$xml);
    &write_channels($writer, $chs);
    $writer->end();
    return $xml;
}


# fetch list of channels
sub channellist {
    my $mech = shift;
    my $chs;
    my $url = "/NETServ/br/prog/canais.jsp?$pname_sel=$conf->{selection}->[0]";

    my $html = &fetch($mech, $base . $url);

    # parse content
    # each channel on this page should match the regexp below
    while ($html =~ s/canais_detalhes\.jsp\?.*?\Q$pname_ch\E=(\d+).*?<span[^>]*>\s*(\d+)\s*\-\s*(.*?)<\/span>//s) {
	my $number = $2 + 0; # convert string to number
	$chs->{$number}->{ID} = $1;
	$chs->{$number}->{Number} = $number;
	$chs->{$number}->{Name} = $3;
	# remove weird trailing star (no explanation on the site)
	$chs->{$number}->{Name} =~ s/\*$//;
	$chs->{$number}->{rfc2838} = &rfc2838($chs->{$number});
    }

    unless ($chs && scalar keys %$chs) {
	$bar->finish() if $bar;
	say("No channels found!\nThis likely means that the NET website has changed.\n");
	exit 1;
    }

    return $chs;
}


# grab broadcasts
sub grab {
    my $mech = &init();
    $writer = &build_writer();

    # warn if trying to fetch data out of range
    say("Can't fetch program data longer ago than yesterday! Ignoring.")
	if $opt->{offset} < -1 && ! $opt->{quiet};

    # calculate last date to retrieve
    my $maxdate = DateCalc("today", "+ " . ($opt->{offset} + $opt->{days}) . " days");
    # calculate absolute maximum supported (4 weeks)
    my $absmax = DateCalc("today", "+ 4 weeks");

    if (Date_Cmp($absmax, $maxdate) < 0) {
	say("Can't fetch program data further than 4 weeks in the future! Ignoring.") unless $opt->{quiet};
	# set last date to maximum supported value
	$maxdate = $absmax;
    }

    my $ch_written = 0;
    my @skiplist;

    # loop over days
    for my $offset ($opt->{offset} .. $opt->{offset} + $opt->{days} - 1) {

	next if $offset < -1;
	my $date = DateCalc("today", "+ $offset days");

	# finish if going beyond specs
	last if Date_Cmp($maxdate, $date) < 0;
	my $date_formatted = UnixDate($date, '%d/%m/%Y');

	# set up progress-bar
	$bar = new XMLTV::ProgressBar( {
	    name => "fetching data for $date_formatted",
	    count => scalar(keys %$channels)
	    } ) unless $opt->{quiet};

	# loop over channels
	foreach (keys %$channels) {

	    update $bar if $bar;

	    # skip if ignore is set
	    next if $channels->{$_}->{ignore};

	    # free memory (previous days)
	    $channels->{$_}->{Broadcasts} = undef;

	    # fetch basic information of broadcasts
	    &channel($mech, $channels->{$_}, $date, $date_formatted);
	    sleep(int(rand($relax) + 1)) if $relax;
	}

	# terminate progess-bar
	$bar->finish() if $bar;

	# write channel information (only once)
	unless ($ch_written) {
	    &write_channels($writer, $channels);
	    $ch_written = 1;
	}

	# write program information (broadcast)
	# loop over channels
	foreach my $ch (map $channels->{$_}, keys %$channels) {

	    # skip if ignore is set
	    next if $ch->{ignore};

	    # set up progress-bar for details
	    $bar = new XMLTV::ProgressBar( {
		name => "fetching details for $ch->{Name} $date_formatted",
		count => scalar(keys %{$ch->{Broadcasts}})
		} ) if $slow && ! $opt->{quiet} && scalar(keys %{$ch->{Broadcasts}});

	    # loop over broadcasts
	    foreach my $bc (map $ch->{Broadcasts}->{$_}, keys %{$ch->{Broadcasts}}) {

		# fetch all the details?
		if ($slow) {

		    if ($bc->{URL}) {
			update $bar if $bar; # progress-bar

			# skip if it's a repetition
			my $id = join('|', $ch->{rfc2838}, $bc->{Time});
			next if grep $_ eq $id, @skiplist;

			# fetch details
			# and loop over repetitions of this broadcast
			foreach my $time (&details($mech, $bc)) {

			    # skip if out of date-range (avoiding collisions)
			    next if Date_Cmp(ParseDate($time), $maxdate) > 0;

			    # copy broadcast
			    my $rep;
			    map $rep->{$_} = $bc->{$_}, keys %$bc;

			    # set time
			    $rep->{Time} = $time;

			    # write out
			    &write_broadcast($writer, $rep, $ch->{rfc2838});

			    # add to skip-list
			    push @skiplist, join('|', $ch->{rfc2838}, $time);
			}
			sleep(int(rand($relax) + 1)) if $relax;

		    }else{ # no URL -> no details
			say("No URL found for $ch->{Name} $bc->{Time} $bc->{Title}!\nPlease check. Maybe the NET website has changed.\n");
		    }
		}

		# write out broadcast
		&write_broadcast($writer, $bc, $ch->{rfc2838});
	    }
	    $bar->finish() if $bar;
	}
    }
    $writer->end();
    say "fetch completed" unless $opt->{quiet};
}


# fetch broadcasts for one channel (including additional channel-info)
sub channel {
    my ($mech, $ch, $date, $date_formatted) = @_;

    my $url = "/NETServ/br/prog/canais_detalhes.jsp?$pname_sel=$conf->{selection}->[0]&$pname_ch=$ch->{ID}&data=$date_formatted";

    my $html = &fetch($mech, $base . $url);

    # parse additional channel details unless already there
    &parsechannel($ch, $html) unless $ch->{Abbreviation};

    # parse basic broadcast information if date is given
    if ($date) {
	my $counter = 0;

	# get time, URL, title and episode
	while ($html =~ s/<tr>.*?>(\d?\d)h(\d\d)<.*?href=\"(\/NETServ\/br\/prog\/programa.jsp\?[^\"]+)\"[^>]*>(.*?)<\/a><\/td>.*?<\/tr>//si) {

	    my $url = $3;
	    my $title = $4;
	    my $time = UnixDate(Date_SetTime($date, $1, $2), '%q %z');
	    $ch->{Broadcasts}->{$time}->{URL} = $url;
	    $title =~ s/^\s+//; # remove leading whitespace
	    $title =~ s/\s+/ /g; # remove consecutive whitespace
	    $title =~ s/^\((.*)\)$/$1/; # remove weird surrounding parens

	    # extract episode-number
	    if ($title =~ s/\s*\-\s*Ep(is(|&oacute;)dio|\.)\s*(\d+)//) {
		$ch->{Broadcasts}->{$time}->{Episode} = $3;
	    }

	    $ch->{Broadcasts}->{$time}->{Title} = $title;
	    $ch->{Broadcasts}->{$time}->{Time} = $time;

	    $counter++;
	}
	unless ($counter) {
	    # If date is today or tomorrow and there is no information
	    # available we consider this channel "dead" and comment it
	    # out in the config without user-interaction.

	    if (Date_Cmp($date, DateCalc("today", "+ 1 days")) < 0 &&
		Date_Cmp($date, DateCalc("today", "- 1 days")) > 0) {

		say("No data available. Removing $ch->{Name}!") unless $opt->{quiet};

		&delete_ch($ch);
	    }else{
		say("Channel $ch->{Name} has no broadcast information available.\nConsider removing it from your configuration.\nIf this happens to all channels its likely that the NET website changed.") unless $opt->{quiet};
	    }
	}
    }
}


# parse additional channel-info
sub parsechannel {
    my ($ch, $html) = @_;

    my $num = $ch->{Number} + 0; # convert to number

    # name, abbreviation and description
    if ($html =~ /<LI><SPAN[^>]*>([^<]*)<\/SPAN><BR>\s*<SPAN[^>]*>(...)?\s+\Q$num\E<\/SPAN><BR>\s*<SPAN[^>]*>([^<]*)<\/SPAN>\s*<\/UL>/si) {
	$ch->{Name} = $1 unless $ch->{Name};
	$ch->{Abbreviation} = $2 if $2 && $2 ne $ch->{Name};
	$ch->{Description} = $3;
	# remove consecutive whitespace
	$ch->{Description} =~ s/\s+/ /g;
	# remove leading and trailing whitespace
	$ch->{Description} =~ s/^\s*(.*)\s*$/$1/g;
	# remove weird trailing star (no explanation on the site)
	$ch->{Name} =~ s/\*$//;
    }else{
	&delete_ch($ch);
	# remove inconsistencies! Maybe give a hint to reconfigure?
    }

    # not implemented: Logo of channel
    # not a single channels seems to have been implemented with a logo by NET

    # gather URL under image "VEJA O SITE DO CANAL"
    if ($html =~ /href\s*=\s*\"([^\"]+)\"[^>]*><img\s+src\s*=\s*[\'\"\w\/]*?sitecanal\.gif/si) {
	$ch->{URL} = $1 if $1;
    }
}


# fetch details for a broadcast
sub details {
    my ($mech, $bc) = @_;

    my $html = &fetch($mech, $base . $bc->{URL});

    # not implemented: Logo of broadcast
    # not a single broadcast seems to have been provided with a logo by NET

    # directors
    push @{$bc->{Director}}, split (/,\s*/, $3)
	if $html =~ /<b[^>]*>Dire(&ccedil;|)(&atilde;|)o:<\/b>\s*<[^>]+>\s*(.*?)\.?<\//is;

    # actors
    if ($html =~ /<b[^>]*>Elenco:<\/b>\s*<[^>]+>\s*(.*?)\.?<\//is) {
	foreach (split (/,\s*/, $1)) {
	    # remove 'Vozes de'
	    # in case of a comic the person speaking a character is the actor
	    s/^vozes\s+de\s+//i;
	    push @{$bc->{Actors}}, $_;
	}
    }

    # year
    $bc->{Date} = $1 if $html =~ /<b[^>]*>ano:<\/b>\s*<[^>]+>\s*(\d{2,4})<\//is;

    # description
    if ($html =~ /<b[^>]*>sinopse:<\/b>\s*<[^>]+>\s*(.*?)\s*<\//is) {
	$bc->{Description} = $1;
	$bc->{Description} =~ tr/\n\r/  /; # remove linebreaks
	$bc->{Description} =~ s/\s+/ /g; # remove consecutive whitespace
	delete $bc->{Description} if $bc->{Description} =~ /Sinopse\s+N(|&atilde;)o\s+Dispon(|&iacute;)vel/i;
    }

    # length
    $bc->{Length} = $3 if $html =~ /<b[^>]*>dura(&ccedil;|)(&atilde;|)o:<\/b>\s*<[^>]+>\s*(\d+)\s*minutos<\//is;

    # star-rating
    if ($html =~ /<b[^>]*>avalia(&ccedil;|)(&atilde;|)o:<\/b>\s*((<img\s+src\s*=\s*([\'\"])([^\'\"]+)\5\s+[^>]+>)+)\s*<\//is) {
	$bc->{star_icon} = "$base$6";
	my $stars = $3;
	my $rating;
	$rating++ while $stars =~ s/(<img[^>]+>)//i;
	$bc->{star_rating} = "$rating / $maxrating";
    }

    # category (no longer provided)
#    $bc->{Category} = $1 if $html =~ /<td><b>G&ecirc;nero:<\/b>&nbsp;(.*?)<\/td>/s;

    # original name
    $bc->{original_name} = $1 if $html =~ /<b[^>]*>nome original:<\/b>\s*<[^>]+>\s*(.*?)<\//is;
    # remove episode-number from original name
    $bc->{original_name} =~ s/\s*\-\s*Eps?\.?\s*\Q$bc->{Episode}\E//i;

    # rating (eg. PG13)
    $bc->{rating} = $3 if $html =~ /<b[^>]*>classifica(&ccedil;|)(&atilde;|)o:<\/b>\s*<[^>]+>\s*(.*?)<\//is;

    # Gather times when this broadcast is repeated
    # to skip duplicate queries for the same show
    my @repeat;

    while ($html =~ s/<INPUTs+TYPE="Checkbox"\s+name="horaAlerta"\s+value="\d+"><\/TD>\s*<TD[^>]+>([^>]+)<\/TD>//si) {

	my $time = $1;

	# convert time to usable format
	$time =~ s/^.*,\s+(\d+)\/(\d+)\/(\d+)[\D]+(\d+)h(\d+)/UnixDate(ParseDate("$3$2$1 $4:$5"), '%q %z')/e;

	# add to repetition list if its not the broadcast itself
	push @repeat, $time unless $time eq $bc->{Time};
    }
    return @repeat;
}


# write channellist
sub write_channels {
    my ($writer, $chans) = @_;

    # available channel-properties:
    # Number, Name, Abbreviation, Description, rfc2838, URL

    # Todo: Move the channel-number to a separate element for the new 0.6-DTD

    my $ch;
    map {
	$ch->{"$chans->{$_}->{rfc2838}"} = {

	    'id' => $chans->{$_}->{rfc2838},

	    'display-name' => [ [ "$chans->{$_}->{Name}", $lang ] ]
	    };

	push @{$ch->{"$chans->{$_}->{rfc2838}"}->{'display-name'}},
	[ "$chans->{$_}->{Abbreviation}", $lang ]
	    if $chans->{$_}->{Abbreviation};

	# add channel-URL if available
	$ch->{"$chans->{$_}->{rfc2838}"}->{url} = [ $chans->{$_}->{URL} ] if $chans->{$_}->{URL};


    } grep !$chans->{$_}->{ignore}, keys %$chans;
    $writer->write_channels($ch);
}


# write XML of one show
sub write_broadcast {
    my ($writer, $p, $ch) = @_;

    # basic information
    my $prog = {
	channel => $ch,
	start   => $p->{Time},
	title   => [[$p->{Title}, $lang]]
	};

    # add URL if available
    $prog->{url} = [ $base . $p->{URL} ] if $p->{URL};

    # add episode if available
    $prog->{'episode-num'} = [ [$p->{Episode}, 'onscreen'] ] if $p->{Episode};

    # details only if requested
    if ($slow) {
	# add original title
	# language is not given by NET, but most of the time its english
	push @{$prog->{title}}, [$p->{original_name}, 'en']
	    if $p->{original_name} && $p->{original_name} ne $p->{Title};

	# add description
	$prog->{desc} = [ [$p->{Description}, $lang] ] if $p->{Description};

	# add date (year)
	$prog->{date} = $p->{Date} if $p->{Date};

	# add credits
	$prog->{credits}->{director} = $p->{Director} if $p->{Director};
	$prog->{credits}->{actor} = $p->{Actors} if $p->{Actors};

	# add length (in seconds)
	$prog->{length} = $p->{Length} * 60 if $p->{Length};

	# rating
	$prog->{rating} = [ [$p->{rating}] ] if $p->{rating};

	# star-rating
	if ($p->{star_rating}) {
	    my $image = {};
	    $image = {src => $p->{star_icon}} if $p->{star_icon};
	    $prog->{'star-rating'} = [ $p->{star_rating}, [$image] ];
	}
    }

    # write output
    $writer->write_programme($prog);
}


# create XMLTV::Writer Object
sub build_writer {
    my $content = shift;

    my %writer_args;

    # define output
    if ($opt->{output}) {
	my $handle = new IO::File(">$opt->{output}");
	unless (defined $handle) {
	    say("cannot write to output file, $opt->{output}!\n$!\n");
	    exit 1;
	}
	$writer_args{'OUTPUT'} = $handle;
    }elsif(ref($content) && defined $content) {
	$writer_args{'OUTPUT'} = $content;
    }

    # set encoding
    $writer_args{'encoding'} = $enc;

    my $writer = new XMLTV::Writer(%writer_args);

    # write header
    $writer->start( {
	'generator-info-name'  => "$grabbername/$rev",
	'generator-info-url'   => 'http://membled.com/work/apps/xmltv/',
	'source-data-url'      => "$base/NETServ/br/prog/canais.jsp?$pname_sel=$conf->{selection}->[0]",
	'source-info-url'      => "$base/",
	'source-info-name'     => 'NET website',
	'date'                 => UnixDate(ParseDate('today'), '%Q')
# The line below does not validate correctly.
# If the time is included it is also (incorrectly) compared.
# So the date only :-(
#	'date'                 => UnixDate(ParseDate('today'), '%q %z')
	} );
    return $writer;
}


# initialize mech and set cookie for city
sub init {
    my $mech = shift;
    my $c = shift || $conf->{city}->[0];

    if ($mech) {
	# reset cookies
	$mech->cookie_jar(HTTP::Cookies->new);
    }else{
	$mech = &initmech();
    }

    &fetch($mech, "$base/NETServ/br/home/html/1.jsp");

    # submit city and get cookie(s)
    $mech->submit_form( form_name => $pname_form, fields => { $pname_city => $c } );

    # retry on error
    my $retry = 1;
    while (! $mech->success() && $retry < $retry_max) {
	sleep $retry_sleep;
	$retry++;
	$mech->submit_form( form_name => $pname_form, fields => { $pname_city => $c } );
    }
    &net_error("Failed to submit city for retrieving cookie" ) unless $mech->success();

    return $mech;
}


# initialization for the mech-connection
sub initmech {
    my $mech = WWW::Mechanize->new( stack_depth => 1,
				    onerror => undef,
				    agent => "xmltv/$XMLTV::VERSION" );
    $mech->cookie_jar(HTTP::Cookies->new);
    $mech->env_proxy();
    $mech->default_header(Accept => "text/*");
    return $mech;
}


# fetch list of available cities
sub citylist {
    my $url = '/NETServ/br/home/html/1.jsp';
    my $mech = &init(undef, 1); # 1 - aka So Paulo
    update $bar if $bar;

    my $html = &fetch($mech, $base . $url);
    update $bar if $bar;

    my %cities;
    if ($html =~ /this\.states\s*\=\s*\[(.*?)\]\;\s*this\.idstates\s*=\s*\[(.*?)\]/is) {
	my $names = $1;
	my $values = $2;
	my @n = map /\"([^\"]+)\"/, split(/\,/, $names);
	my @v = map /\"([^\"]+)\"/, split(/\,/, $values);
	for (0 .. $#n) {
	    $cities{$v[$_]} = $n[$_];
        }
    }
    return ($mech, \%cities);
}


# fetches a list of available bundles for the selected city
sub bundlelist {
    my $mech = shift;
    my $url = '/NETServ/br/prog/canais.jsp';

    update $bar if $bar;

    my $html = &fetch($mech, $base . $url);

    my $bundles = {};
    my @params;

    &parsebundles($html, $url, $bundles);

    # select digital or analog or ???
    if ($html =~ /<INPUT\s+TYPE\s*=\s*(\"|\')?radio\1\s+NAME\s*=\s*(\"|\')?padrao\2/si) {
	my $x = $html;
	# extract parameters from radio buttons
	while ($x =~ s/<INPUT\s+TYPE\s*=\s*(\"|\')?radio\1\s+NAME\s*=\s*(\"|\')?padrao\2\s*[^>]*?\s+onclick\s*=\s*(\"|\').*?\Q$url\E(\?\w+=\d+).*?\3\s*(checked)?\s*>//si) {
	    next if $5; # we are looking at this setup right now
	    push @params, $4; # put the rest on a stack
	}
    }

    # work on the rest of the available bundles
    foreach (@params) {
	update $bar if $bar;
	&parsebundles(&fetch($mech, $base . $url . $_), $url, $bundles);
    }

    return $bundles;
}


# parse bundlename and ID from the HTML
sub parsebundles {
    my ($html, $url, $bundles) = @_;

    # extract name and URL-parameters of the bundles
    while ($html =~ s/<A\s+HREF\s*=\s*(\"|\')?\Q$url\E\?([\w\&\?=]+)\1\s*[^>]*>([^<]+)<\/A>//si) {
	my $name = $3;
	my $params = $2;
	# only take the value of parameter $pname_sel
	my ($selection) = $params =~ /\Q$pname_sel\E=(\d+)/;
	$bundles->{$selection} = $name;
    }
}


# Trying to create a URI conforming to RFC2838

# Net has their own IDs for the channels. They are visible in the
# config-file and if you look at their HTML-sources. They're needed
# to fetch the data but have no other purpose. I simply can't tell
# if they're unique identifiers throughout their site. We concatenate
# channelnumber, Net-ID, lineup, city and the base-url of their
# website into the XMLTV id. The city and lineup make it possible to
# merge (tv_cat) results from different lineups and cities without
# conflicts (there might be duplicates).

sub rfc2838 {
    my $ch = shift;
    # concatenate channelnumber, channel-ID, selection, city and domain
    return sprintf("%03d.%d.%d.%d.%s",
		   $ch->{Number}, $ch->{ID}, $conf->{selection}->[0], $conf->{city}->[0], $domain);
}


# some error occurred gathering data from the website
sub net_error {
    my $err = shift;
    $bar->finish() if $bar;

    # finish with a probably incomplete but valid file
    $writer->end() if $writer;

    say ("ERROR: $err\nThere was a problem grabbing data.\nPlease check your connection/proxy.\nMaybe even the NET website was changed. Please check there and read the manual :-(\n");

    exit 1;
}


# wrapper for mech->get() and mech->content
# with configurable support for retry on failure
sub fetch {
    my ($mech, $url, $retry) = @_;
    $mech->get($url);
    unless ($mech->success()) {
	if ($retry < $retry_max) {
	    $retry++;
	    sleep $retry_sleep;
	    return &fetch($mech, $url, $retry);
	}else{
	    &net_error("HTTP " . $mech->status() . " Failed to fetch $url");
	}
    }
    return $mech->content();
}


# comment out a channel from the config and set the ignore-flag
sub delete_ch {
    my $ch = shift;
    $ch->{ignore} = 1;

    # silently ignore if config-file is not writable
    return unless -w $opt->{'config-file'};

    my @config_lines;

    # read original (with comments)
    open(CONFIG, "<$opt->{'config-file'}") || return;
    foreach (<CONFIG>) {
	s/^(\s*channel\s*)=(\s*\Q$ch->{rfc2838}\E\s*)$/$1!$2/g;
	push @config_lines, $_;
    }
    close CONFIG;

    # write modified config
    open(CONFIG, ">$opt->{'config-file'}") || return;
    print CONFIG join '', @config_lines;
    close CONFIG;
}


# fill global variable $channels with values from the config
sub fill_ch {
    # used channels
    foreach (@{$conf->{channel}}) {
	if (/^((\d{3})\.(\d+)\.\Q$conf->{selection}->[0]\E\.\Q$conf->{city}->[0]\E\.\Q$domain\E)$/) {
	    $channels->{$2}->{rfc2838} = $1;
	    $channels->{$2}->{Number} = $2 + 0;
	    $channels->{$2}->{ID} = $3;
	}else{
	    die "configfile corrupt! Please run with --configure\n";
	}
    }

    die "No channels in config! Please run with --configure\n"
	if scalar keys %$channels < 1;

    # unused channels
    foreach (@{$conf->{no_channel}}) {
	if (/^((\d{3})\.(\d+)\.\Q$conf->{selection}->[0]\E\.\Q$conf->{city}->[0]\E\.\Q$domain\E)$/) {
	    $channels->{$2}->{ignore} = 1;
	    $channels->{$2}->{rfc2838} = $1;
	    $channels->{$2}->{Number} = $2 + 0;
	    $channels->{$2}->{ID} = $3;
	}else{
	    die "configfile corrupt! Please run with --configure\n";
	}
    }

    # slow / fast
    if ($fast) {
	$slow = undef;
    }elsif($slow) {
	$slow = 1;
    }elsif($conf->{slow}->[0]) {
	$slow = 1;
    }else{
	$slow = undef;
    }

    # relax
    unless (defined $relax) {
	$relax = $conf->{relax}->[0];
    }
    $relax = undef if $relax !~ /^\d+$/;
}


# configuration via api
sub config_stage {
    my ($stage, $conf) = @_;
    my $result;
    my $writer = new XMLTV::Configure::Writer( OUTPUT => \$result,
					       encoding => $enc );

    $writer->start( { grabber => $grabbername } );
    if ($stage eq 'start') { # first stage selects the city

	# set up progress-bar
	$bar = new XMLTV::ProgressBar({name => 'looking up cities', count => 44})
	    unless $opt->{quiet};

	# fetch list of available cities
	my ($mech, $cities) = &citylist();

	$bar->finish() if $bar;

	# nothing found?!?
	&net_error('Failed to fetch citylist') unless scalar keys %$cities;

	# title and desc for city
        $writer->start_selectone( {
            id => 'city',
            title => [ [ 'City', 'en' ], [ 'Cidade', $lang] ],
            description => [ [ 'City you want to download data for', 'en' ],
			     [ 'Cidade para que voc quer a programao', $lang] ]
			     } );

	# write sorted citylist
        foreach (sort {$cities->{$a} cmp $cities->{$b}} keys %$cities) {
            $writer->write_option( { 
                value => $_,
                text => [ [ $cities->{$_}, $lang ] ]
		} );
        }

        $writer->end_selectone();
        $writer->end('select-lineup');
	# end of city-selection


    }elsif ($stage eq 'select-lineup') { # 2nd stage for lineup-selection

	# init for selected city
	my $mech = &init(undef, $conf->{city}->[0]);

	# set up progress-bar
	$bar = new XMLTV::ProgressBar({name => "looking up available bundles", count => 8})
	    unless $opt->{quiet};

	# fetch list of available bundles/selections for the selected city
	my $bundles = &bundlelist($mech);

	$bar->finish() if $bar;

	# no bundles?
	&net_error("Failed to fetch bundlelist for city $conf->{city}->[0]") unless scalar keys %$bundles;

	# title and desc for bundle
        $writer->start_selectone( {
            id => 'selection',
            title => [ [ 'Bundle', 'en' ], [ 'Grade de canais', $lang] ],
            description => [ [ 'Bundle you want to download data for', 'en' ],
			     [ 'Grade de canais para programao', $lang] ]
			     } );

	# write sorted bundlelist
        foreach (sort {$bundles->{$a} cmp $bundles->{$b}} keys %$bundles) {
            $writer->write_option( {
                value => $_,
                text => [ [ $bundles->{$_}, $lang ] ]
		} );
        }

        $writer->end_selectone();
        $writer->end('options');
	# end of bundle-selection. Next are options


    }elsif ($stage eq 'options') { # 3rd stage for options

	# title and desc
        $writer->start_selectone( {
            id => 'slow',
            title => [ [ 'Enable details (slow)', 'en' ],
		       [ 'Ligar detalhes (lento)', $lang] ],
            description => [ [ 'Enable mode with more details (credits, rating, etc.)? This takes about 20 times longer and produces about 2 times more output', 'en' ],
			     [ 'Ativar o modo mais detalhado (creditos, ano, etc.)? Esta muito mais lento (20 vezes) e produz 2 vezes mais detalhes.', $lang] ]
			     } );

	$writer->write_option( {
	    value => 0,
	    text => [ [ 'off', 'en' ], [ 'desligado', $lang ] ]
	    } );

	$writer->write_option( {
	    value => 1,
	    text => [ [ 'on', 'en' ], [ 'ligado', $lang ] ]
	    } );

	$writer->end_selectone();


	# relax
        $writer->write_string( {
            id => 'relax',
            title => [ [ 'Enable relaxed mode', 'en' ],
		       [ 'Ativar modo relaxado', $lang] ],
            description => [ [ 'Enable pausing between page-fetches in order not to look like a robot. Specify a number in seconds or 0 to disable.', 'en' ],
			     [ 'Este modo faz uma pausa entre a busca das paginas. Por favor digite 0 para desligar ou digite um numero em segundos.', $lang] ],
	    default => 0
	    } );

        $writer->end('select-channels');
	# end of options. Next are channels


    } else {
	die "Unknown stage $stage";
    }

    return $result;
}
