You can get the latest version of FilterProxy, and some instructions at
http://draal.physics.wisc.edu/FilterProxy/

FilterProxy is a personal filtering proxy.  It is unique in that it allows
"Modules" to be installed that can perform arbitrary transformations on HTML
(or any other mime-type).  Currently it filters ads by rewriting HTML,
compresses HTML content (for a 5-1 speedup on modems!), and de-animates
animated gif's.  Configuration is done with web forms.

Modules currently supplied and tested are:
  * Rewrite: allows removal and modification of arbitrary parts of a
    html file using a configurable set of 'rules'.
  * XSLT: XML Stylesheet Language Transformations.  XSLT is a W3C
    recommendation, which is a language for transforming XML documents into
    other XML documents.
  * Header: can strip or add headers by regex.  
  * Compress: uses gzip compression to compress html.  (4-5 times speed
    improvement for html, your browser uncompresses it)
  * DeAnim: de-animates animated gifs, and removes other "extension
    blocks", which often reduces the size of gifs.
  * Skeleton: a barebones, heavily commented module for people wanting
    to write new modules.  See the TODO file for a list of work that
    needs to be done to extend this program.
  * ImageComp: a module which uses ImageMagick to recompress various image
    formats to reduce their size. (INCOMPLETE - Volunteers needed)

Where to run FilterProxy:
-------------------------
  There are two basic ways to run FilterProxy.  

  One is where FilterProxy is running on the same machine you are browsing from,
  and that machine is connected to the net via a slow interface (i.e. a modem)
  In this case it makes sense to use the following modules:
      Rewrite
      XSLT
      Mirror (when/if written)
  I also suggest enabling "localhost only" in this mode (for security).

  The second is where FilterProxy is running on a computer with relativly fast 
  connection, and you are using it from a different computer, over a modem.  
  In this case it makes sense to use the following modules:
      Compress (this will give you a ~5x speed improvement for html!)
      Rewrite
      XSLT

  Another way is where FilterProxy is running on a computer with a fast 
  connection, and you are browsing from the same computer.  This is basically
  the same as #1 above.  Again, using "localhost only" is recommended.

  Make sure to install FilterProxy on a relatively fast computer.  Don't
  put it on your OpenBSD firewall that's got a Pentium 90 in it.  Parsing
  and filtering HTML is a computationally intensive task, and requires
  a reasonable amount of CPU.  On my 533 Mhz alpha, most pages get filtered
  in under 0.5 seconds.  On an 800 Mhz athlon I have access to, most
  pages get filtered in under 0.2 seconds.  But on an older computer
  it could take many seconds, introducing a noticable delay.  (This is
  only for HTML, images are usually very fast)

  If you're installing from the rpm, FilterProxy will install itself in
  /home/filterproxy, create a user for itself, and create an init script
  /etc/rc.d/init.d/filterproxy.  If you wish to start FilterProxy on bootup, you
  should create a link to this script from /etc/rc5.d/ (or whatever your default
  runlevel directory is)

FilterProxy also supports the following command line options:
# FilterProxy.pl -h
    Options recognized by FilterProxy:
      -h          Print this help message
      -k          Kill an already running copy of FilterProxy
      -f <file>   Specify an alternate config file 
                    (default is `pwd`/FilterProxy.conf)
      -p <port>   Specify the port to which FilterProxy will bind 
                    (default is 8888)
      -n          Do not daemonize: stay connected to the terminal from which
                    it was started and print debugging messages.

If you wish to use *another* proxy in addition to FilterProxy, you may
set the environment variable http_proxy to point to the other proxy.
It is also possible to set this from the CGI config page, FilterProxy.html
For instance, if your ISP runs a caching proxy, set something like:
    # setenv http_proxy http://your.isp.here:1234          (csh syntax)
    # http_proxy=http://your.isp.here:1234                 (sh syntax)
Where 1234 is the port your other proxy runs on, and your.isp.here is
the ip address of the proxy.  (I have not tested this very well, but I
have reports that it works as of ~0.15)  If the upstream proxy requires
authentication, this information can be entered on the main FilterProxy
config page.  (only works with BASIC authentication right now)

The reason I wrote FilterProxy is to fix some problems with the web (in
general) and brain dead web-site designers (specifically).  Modules that
I would like to see in the future:
    Cookie      Filter cookies by server (i.e. do not send any cookies to
                ad servers, while still allowing cookies for other
                sites)  Allow for sophisticated cookie management
                (check out HTTP::Cookies).
    Anchorizer  Add <a href="..."> to identifiable URL's in a web page, when
                those URL's don't already contain them.
    Clean       clean-up HTML (specifically, remove MS's attempts at
                redefining ASCII by adding forward and back quotes,
                which appear on many browsers as '?') (use HTML::Clean
                package)
    Mirror      Keep a local copy of all images on often visited sites.

There are other things this program could do, if extended properly:
1) automatically download ads on sites you like to visit (but not
  displaying them to you) -- this gets money for the visited site.
2) ...I'll think of more...

Thanks to the following folks for their help and ideas:

  Abigail, author or abiprox: http://language.perl.com/misc/abiprox/
  Randal L. Schwartz, author of the WebTechniques column.  Of particular interest
    are numbers 11 and 34, upon which FilterProxy is partially based.
    http://www.stonehenge.com/merlyn/WebTechniques/
  Robert W. Cunningham <rcunning@acm.org> who has been very forthcoming with
    ideas and helping me test FilterProxy.
  Steve Sekula, who has also been helpful in testing.
  John Conneely <john@figaro.org> for rpm spec files, Header module updates.
  David MacKenzie <djm@web.us.uu.net> for rpm spec file additions.
  Richard Tibbetts <tibbetts@MIT.EDU> for "localhost only" option.
  Seth Golub <seth@aigeek.com> for some patches.
  Vineet Kumar <vineet@doorstop.net> for patches related to transparent proxying.
  Vladimir N Goncharov <vovva@chat.ru> for upstream proxy patch.
  Danek Duvall <duvall@emufarm.org> for some very useful discussions.
  Members of debian-devel and debian-legal for helpful discussions.
  Kenneth Vestergaard Schmidt <kenneth@bitnisse.dk> for packaging FilterProxy 
    for debian, and including it in the main debian distribution.  (apt-get it!)
  Siggi Langauf <siggi@debian.org> For many interesting discussions, and some
    very good ideas.  Siggi also came up with the javascript bookmarks.
  Baxter Rogers <bprogers@students.wisc.edu> for pointing out some bugs, and
    not being afraid to pester me ;).
  Mario Lang <lang@zid.TUGraz.at> for the XSLT module.

Bob McElrath <mcelrath+filterproxy@draal.physics.wisc.edu>.

WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
  FilterProxy is not a completed work, and should be considered "alpha"
  software.  It has bugs.  These bugs may set your cat on fire.  They may crash
  your computer, erase your hard drive, mail porn to your grandmother, or other
  nasty things.  You have been warned.

  YOU are responsible for all content that passes for FilterProxy.  Do not
  filter content for other people without their knowledge and consent.  This
  software is not intended to be a "netnanny", filter dirty words, or prevent
  your thirteen-year-old from seeing pornography.  There are lots of other
  filters out there for that -- go find one if you want one.

  A note about filtered content:
  ------------------------------

  It is clear to me that rewriting web pages for yourself should not pose any
  kind of legal quandry.  That is, removing banner ads from HTML is perfectly
  legal, just as scribbling in a book you have bought, or ripping pages out of
  a book you have bought is legal.  

  What is *not* legal is redistributing modified content.  In most cases, HTML
  is copyrighted by the site you downloaded it from, and I doubt they would be
  very happy if you started redistributing modified copies of their site.

  DO NOT install FilterProxy to filter other people's content (without their
  knowledge and consent).  You may be liable for copyright infringement on the
  pages filtered.  FilterProxy is meant to be a PERSONAL ad-filtering proxy. 

WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING

