  Tiny Programs

The programs listed here are provided as exhibitions of just how compressed
a Linux ELF executable can be and still function.

Take a look through /usr/bin: how big is the smallest ELF executable in
there? Now try writing the smallest hello-world program you can. Can you
get it under 1K? Can you get it under 100 bytes?

This is precisely what I set out to do one day, and I ended up with a minor
obsession over creating the smallest possible executables.

The distribution comes with preassembled binaries along with the assembly
source code. If you wish to be able to rebuild them yourself, you will need
to have a copy of Nasm, the excellent x86 assembler. Nasm's home page is at
http://www.web-sites.co.uk/nasm/.
___________________________________________________________________________

  Some Introductory Blather Which You Should Feel Free to Skip if You Want
  to Get to the Good Stuff

Your average ELF executable, even after being stripped, contains a fair bit
of overhead, some of it completely superfluous. And when you rip out
everything that doesn't contribute to a program's memory image, much of
what remains, beside the program code itself, is information needed by the
dynamic linker. This is of course very important information to have,
normally -- but if the program makes its own system calls, it can dispense
with that overhead as well.

Of course, there are few tools out there that will create an executable
that has none of this extra machinery. But if one is willing to define the
file entirely by hand, an executable can be created with the absolute
minimum contents, including only the information that Linux needs in order
to get it into memory and start it running.

The minimum information an ELF executable file needs to have is an ELF
header and a program header table with at least one entry. (For more
information about these entities, consult the ELF standard. You can
download a copy of it as a Postscript document from
ftp://tsx.mit.edu/pub/linux/packages/GCC/ELF.doc.tar.gz, or from the
equivalent location of any repository containing GNU's gcc package for
Linux. Or, you can also retrieve a transcription of the document as a
flat-text file from http://www.muppetlabs.com/~breadbox/software/ELF.txt.)
But, even in these two remaining structures, there are fields that aren't
used, particularly with a bare-bones executable. Furthermore, there are
other fields in these structures which do serve a purpose, but which are
not, at this point in time, actually examined by the Linux OS. So, these
fields can be used, if the programmer is desperate enough, to hold other
bits of data, or even small amounts of code....

Let me pause for a moment and make this perfectly clear: I emphatically do
not advocate the wilfull, widespread disregard of standards. Certainly,
just because the ELF standard says that a given field will hold a certain
value does not automatically mean that Linux ought to examine that field
and reject any ELF file with something else there. But by the same token,
just because (the current version of) Linux ignores the contents of that
field does NOT give programmers implicit permission to fill it in with
anything they like. When programmers en masse adhere to existing
implementations instead of existing standards is exactly when a
well-written standard becomes an obstacle instead of a tool, when future
standards are forced to canonize ludicrous practices in the name of
backward-compatibility, and when everything generally goes to hell. So:
where these programs violate requirements of the ELF standard, do not take
them seriously. These programs have been offered up as entertainment, and
perhaps as an education in existing practice. But please do not seek to
emulate such behavior in other, more serious programming work.

In fact, not every program here is portable among all the 2.2 kernels.
Beginning with 2.2.17 (?), Linux began verifying one of fields (namely
e_phentsize) in the ELF header. This field is ignored in prior 2.2 kernels,
and also in the 2.4 kernel. I'm unclear as to reason for the divergence,
but that is the risk one takes when ignoring standards. In order to
accommodate users who are running the pickier kernels, I have supplied
alternate versions of the offending programs. Since these are written
specifically for later 2.2 kernels, I have managed to keep them at their
original size by taking advantage of yet another assumption not guaranteed
to be portable -- namely, that all the general registers are initialized to
zero on program startup. (Not only is this not guaranteed, it is untrue for
2.0 kernels, where only edx is consistently cleared.)

Of course, violating requirements of the ELF standard is only one way in
which these programs achieve such compression. Much else regarding the
organization of the ELF structures in these programs is admittedly
unorthodox, but nonetheless in complete adherence to the standard. Most
important of all, of course, is the creative reworking of algorithms.
Finally, as with any CISC assembly programming, there are also plenty of
uncommon instructions used to accomplish common tasks. (This can be an
especially fruitful avenue of exploration on the Intel x86 as so many of
the short instructions date back to the 8086, or worse, and are usually
considered to be completely obsolescent, as Intel has made no attempt to
keep their timings abreast of the rest of the instruction set. These days,
one usually programs in assembly only in order to optimize the running
time, in which case it is very important to stick with the core instruction
set as much as possible. But when optimizing for space instead of speed,
one suddenly finds new uses for all those obscure one- and two-byte
instructions.)

The path of optimizing an ELF executable for size is described in my essay
"A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux",
at http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html.
___________________________________________________________________________

true.asm

This program returns an exit code of either zero or one, depending on
whether it is invoked with the name "true" or "false", respectively. This
one is the runt of the litter. Its size is 45 (count them) bytes. I believe
that this is the smallest it is possible for a Linux ELF executable to be.

hello.asm
hello-2.2.17.asm

The final version of the program that started me off on this whole pursuit:
hello world. It is 59 bytes long. This may well be the densest one here.
With the program header table overlaid on top of the ELF header, and
program itself running through both of them, some of the bytes have no less
than three completely different purposes.

fgconsole.asm
fgconsole-2.2.17.asm

fgconsole was the smallest ELF executable in my old /usr/bin directory
(2640 bytes), so I decided to see how small my own version could be. I
managed to fit it into 72 bytes. fgconsole simply prints the number of the
currently active virtual console. If run from somewhere other than the
console, it returns a non-zero exit code.

keepalive.asm
keepalive-2.2.17.asm

Here's a program that I actually use, quite frequently in fact. It simply
runs forever, printing a bell character every five minutes or so. I keep it
in the background after telnetting from another machine that times out
connections after 15 minutes of idle time. At 57 bytes, it's a bit longer
than the one-line shell script I originally used, but on the plus side it
doesn't take up an extra process in order to sleep.

bf.asm

Brainfuck is a very simplistic programming language, which was invented by
Urban Mueller solely for the purpose of being able to create a compiler
that was less than 256 bytes in size, for the Amiga OS. (More information
about Brainfuck can be found at http://www.muppetlabs.com/~breadbox/bf/.) I
eventually decided to take up the challenge as well, and create a Brainfuck
compiler for Linux using less than 256 bytes. The compiler works as a
filter: redirect the source code to the compiler's input and the compiler's
output to a file. (And don't forget to chmod +x the output file.) At the
beginning I didn't expect that I would actually succeed, since I thought I
would need almost half of that just for the headers. I think my first cut
was 458 bytes in size. I was quite pleased with myself when I trimmed that
down to less than 300 bytes, ecstatic when I finally reached the 255-byte
goal, and downright triumphant when I later got it to work in 238 bytes. By
the time I had a 198-byte version, I was just plain dumbfounded. It's now
at 171 bytes. And though I can't quite believe it, it works just as well as
the first one did. As useless as it is, I think this one would have to be
the crown jewel of this little collection. It is quite possibly also the
ugliest and most opaque program I have ever written.

hexdump.asm

This is your classic hexdump program. I wrote it one day in a fit of pique,
after trying (and failing) to convince the standard Linux hexdump program
to use the canonical hexdump formatting style. This program, and the ones
that follow, are different from the previous ones in that these adhere 100%
to the requirements of the ELF standard. The reason I did this is because,
unlike the previous programs, these can actually accomplish something
useful. I wanted to be able to keep them around and know that they would
continue to work. This one is actually larger than earlier versions. The
first version only read from stdin, but I found myself using this program
often enough to be annoyed by this limitation, so I added command-line
parsing. As a result of these extra goals, hexdump is a whopping 232 bytes.

ls.asm

After hexdump, I came into contact with Konstantin Boldyshev, who heads the
asumutils project (you can visit the asmutils home page at
http://www.linuxassembly.org/asmutils.html), and I got involved in creating
tiny programs that were actually useful. ls is my first real achievement
along these lines. It is 1017 bytes in size, and sports many of the
important command-line options:

-C  columnar output; default if stdout is a tty
-1  one file per output line; default if stdout is not a tty
-l  "long" output
-F  show file type suffixes
-i  show inode numbers
-s  show size of files in 1K-blocks
-d  don't expand directories
-a  show all (hidden) files
-B  don't show files ending in ~ (backups)
-N  don't replace non-graphic characters with a question mark
-b  show non-graphic characters as octal escape sequences
-R  recursively list contents of subdirectories

(I'm especially proud of getting that last feature to work.)

The "long" output format only displays numeric user IDs, and it displays
timestamps as an age, instead of the actual timestamp. There is no sorting
capability, and the columnar output is much less intelligent than GNU's
(though it looks pretty good most of the time). Beyond that, however, it
conforms pretty closely to the standard ls program we all know and love.

date.asm

date was my the next creation after ls for the asmutils project. (Note that
the programs here are slightly different from the ones in asmutils. The
asmutils programs are portable to other x86-based OSes, such as BSD. The
versions you see here are strictly for Linux, and this allows me to squeeze
out a couple more bytes here and there.) This version of date doesn't have
clock-setting capabilities; it just does date output formatting. It honors
the local time zone, if set, as well as the LANG environment variable. It
also contains its own default locale if LANG isn't set, which makes it a
bit large -- 1448 bytes, to be precise.

factor.asm

This is an implementation of the standard Unix program. It displays the
prime factors of the integers specified on the command line, or on standard
input if no arguments are given. This program is the oddball of this
collection. Instead of simply trying to achieve the smallest possible size,
I decided to shoot for real portability, for a change. This program not
only conforms to the requirements of the ELF standard, it is also the only
program in this collection that actually uses functions from libc (as
opposed to making its own system calls). Thus it should continue to work
with any future version of Linux, as long as new versions of libc and the
ELF standard remain backwards-compatible. In addition, I decided to try to
balance optimizing for both size and speed. This program takes advantage of
the fact that floating-point division can be done in parallel with integer
instructions on all Pentium processors. As a result it is significantly
faster than the GNU implementation of factor for Linux. Finally, I added
online help and an appropriate error message for invalid input. The
resulting program therefore arguably stands as a completely functional
replacement for the existing utility. Its size is 896 bytes.

snake.asm

By now you're probably thinking: "Okay, the first couple of programs were
impressive, but now it's just repetitive. Isn't there anything interesting
in here?" Well, here's something to reward you for reading all the way to
the end: a game! Snake is a classic computer game -- in fact it's so old
that you can hardly find it anymore in its original form. I originally
wrote this as an Easter Egg for a contract involving text-terminal
interfaces. Now it's available to all as a tiny program. In case you've
never played it, the object of the game is to guide your snake around the
playing field, eating the food blocks that appear at random locations, and
dodging the walls and your own tail. As you eat the blocks, your score
increases but so does your length. The program should work on any
VT100-compatible terminal (although see the program comments regarding the
line-drawing character set -- not all Linux console fonts provide correct
support for them). Plus, at 1496 bytes, it's the perfect addition to a
rescue floppy ... when you've just lost all your data, what's better than a
game to take your mind off your troubles?
