----------- Introduction, copyright, license

debdelta is an application suite designed to compute changes between
Debian packages. These changes (deltas) are similar to the output of
the "diff" program in that they may be used to store and transmit only
the changes between Debian packages.  This suite contains
'debdelta-upgrade', that downloads debdeltas and use them to create
all Debian packages needed for an 'apt-get upgrade'.

debdelta is  Copyright (C) 2006-07 Andrea Mennucci

debdelta is free software.  See the file COPYING for copying conditions.


debdelta uses 'minizgip', that is a simplified version of
  /usr/share/doc/zlib1g-dev/examples/minigzip.c.gz

minigzip is released with a permissive license,
(see minigzip.c , or debian/copyright)

minigzip is Copyright (C) 1995-2002 Jean-loup Gailly

---------- Description

The debdelta  application suite is really composed of different applications.

---- debdelta

'debdelta'  computes the delta, that is, a file that encodes the difference
between two Debian packages.

Example:

$ a=/var/cache/apt/archives 
$ debdelta -v $a/emacs-snapshot-common_1%3a20060512-1_all.deb $a/emacs-snapshot-common_1%3a20060518-1_all.deb /tmp/emacs.debdelta

the result is:
 deb delta is  12.5% of deb ; that is, 15452kB would be saved

---- debpatch

'debpatch' can use the delta file and a copy of the old Debian package to
recreate the new Debian package. If the old Debian package is not available,
but is installed in the host, it can use the installed data; in this
case, '/' is used in lieu of the old .deb.

Example:

$ debpatch  /tmp/emacs.debdelta / /tmp/emacs.deb

----- debdeltas

'debdeltas' can be used to generate deltas for many debs at once.
It will generate delta files with names such as
 package_old-version_new-version_architecture.debdelta
and put them in the directory where the new .deb is.

If the delta exceeds ~80% of the deb, 'debdeltas' will delete it
and leave a stamp of the form
 package_old-version_new-version_architecture.debdelta-too-big

Example :

$ debdeltas /var/cache/apt/archives/*deb

With the --dir argument, it will put the deltas in a different tree
(this is necessary if you use 'debmirror' , since 'debmirror' will
 destroy any file that it does not recognize)

Example:

$ m=where_your_mirror_is
$ d=where_to_put_deltas
$ cd $m
$ find pool -type d -mtime -1   |  xargs -r  debdeltas --dir $d// 

The // means that the pool directory tree will be mimicked in the deltas
directory tree.

----------- debdelta-upgrade

This command will download necessary deltas from my mirror
and use them to create debs for an 'apt-get upgrade'

This is currently a hack; it should be replaced by an APT method
(this is work in progress); but for this I need help from APT and python-apt
authors.

Example usage:
# apt-get update && debdelta-upgrade && apt-get upgrade

If run by a non-root user, debs are saved in /tmp/archives : do not
 forget to move them in /var/cache/apt/archives

debdelta-upgrade will also download .debs for which no delta is
available (this is done in parallel to patching, to maximize speed).

Work is in progress so  that debdelta-upgrade will actually choose if
- download a delta and use it to create the .deb
- download the deb
depending on which one would be faster.
Unfortunately, this decision must depend on a good model
to predict the speed of patching... and this I still cannot
achieve.


-------------- Statistics

I am currently running 'debdeltas' in a mirror that mirrors
'etch' and 'sid' for i386. 
The backend is currently bsdiff.

Statistics are at
http://tonelli.sns.it/pub/mennucc1/debdelta/histograms
and daily logs at
http://tonelli.sns.it/pub/mennucc1/debdelta/daily-logs/

Currently, in my mirror
  96 % of deltas are <=80% of original;
  of those, the average percent is  15% .
(15% is the ratio of the total size of the debdelta created / 
 /  the total size of the .debs processed).

In some cases, though, the benefit of using deltas is much more than that:
for example, 'debdelta'  can express the difference between 'tetex-doc'
 3.0-17 and 3.0-18 into a delta of mere 260kB !

------------ exact patching and exact recompression

When debdelta recreates a .deb, it must be identical to the desired
one (otherwise APT will refuse it when cheking signatures).

Suppose a .deb has inside a huge file
 /usr/share/doc/foobar/document.info.gz
and this starts with a RCS tag ... then each time it
is released, the file will be different even though
just few bytes were changed.

Another examples are manpages that start with the header
containing the version of the command.

So , to get good compression of the difference, I had
to be able to gunzip those files, diff them,
and gzip back them *exactly identical*.

For this reason, I studied gzip formats, and I wrote in debdelta
some python code that does the trick (90% of the times...).

------------- Tests and comparisons on backend binary delta difference compressors
see README.txt and all files in
http://tonelli.sns.it/pub/mennucc1/debdelta/tests

-------------- Speed

Warning: this section is referred to experiments where the backend for
delta encoding was 'xdelta' ; currently the default backend is
'bsdiff', that is much slower; work is in progress to find a
compromise.

On a desktop with CPU  Athlon64 3000 and a average hard disk,
$ debdelta mozilla-browser_1.7.8-1sarge3_i386.deb  mozilla-browser_1.7.8-1sarge6_i386.deb /tmp/m-b.debdelta
processes the 10Mb of mozilla-browser in ~11sec, 
that is a speed of ~900kB per second.

Then  debpatch applies the above delta in  16sec,
at a speed of  ~600kB per second.

Numbers drop in a old PC, or in a notebook (like mine, that has a
Athlon 1600MHz and slow disks), where data are chewed at ~200kB per
second. Still, since I have a ADSL line that downloads at
max 80kB per second, I have a benefit downloading deltas.

In a theoretical example, indeed, to download a 80MB package, it would
take 1000seconds; whereas to download a delta that is 20% of 80MB it
takes 200seconds, and then 80MB / (200kB/sec) = 400seconds to apply
it, for a total of 600seconds. So I may get a "virtual speed" of 80MB /
600sec = 130kB/sec .

Note that delta downloading and delta patching is done in parallel:
if 4 packages as above have to be downloaded, then the total
time for downloading of full debs would be 4000seconds, while the time
for  parallel-download-patch-apply-patch may be as low as 1400seconds.

This is a real example of running 'debdelta-upgrade' :
 Looking for a delta for libc6 from 2.3.6-9 to 2.3.6-11
 Looking for a delta for udev from 0.092-2 to 0.093-1
 Patching done, time: 22sec, speed: 204kB/sec, result: libc6_2.3.6-11_i386.deb
 Patching done, time: 4sec, speed: 57kB/sec, result: udev_0.093-1_i386.deb
 Delta-upgrade download time 28sec speed 21.6k/sec
               total time: 53sec; virtual speed: 93.9k/sec.

(Note that the "virtual speed" of 93.9k/sec , while less than the 
130kB/sec of the theoretical example above, is still more than the
80kB that my ADSL line would allow).

Of course the above is even better for people with fast disks and/or
slow modems.

Actually, an apt delta method may do a smart decision of how many
deltas to download, and in which order, to optimize the result, (given
the deltas size, the packages size, the downloading speed and the
patching speed).
