SAMPLE RESOLUTION (OR QUANTIZATION) CONVERTER
=============================================


This device performs conversion between the different 'linear'
formats. It converts from/to signed/unsigned linear data with
8/16/20/24 bits per sample. It does not care about the sample rate or
whether the data is mono or stereo.

Converting between all these formats results in 64 different scenarios
(going from any of 8 possible formats to any of those 8). Instead of
having one big routine to handle all these cases (with if statements
and lots of bit shifting in the inner loop), we decided to write 64
swift little conversion functions, where the inner loop is just an
assignment between the two proper integer types. To make this task
more maintainable, the 64 routines are generated from the 8 basic
scenarios (signed/unsigned = 4, times 2 for up- or down-conversion,
see below) by the perl script gen_algs.pl also found in this
directory. I indicated where the generated code starts in
mas_squant_device.c

The 20 and 24 bits per sample routines are not tested since we don't
have any practical examples where these formats are actually used. You
get (or provide) an array of 32 bit integers where the lower 20 or 24
bits are the sample value. I decided not to zero - pad the 'unused'
bits in the case of signed data. Rather, these bits are all set to 0
or all set to 1, in such a way that accessing the int32's will give
the correct sample value.

When going from a higher to a lower resolution (i.e., from 16 to 8
bits per sample) you are throwing away information; audio quality will
degrade. In these cases the squant device uses a dithering algorithm
to add uniformly distributed random noise to what will become the
least significant bit before truncating. Read, for example, in
'Principles of Digital Audio' by Ken C. Pohlmann, why this is good. In
short, the added random noise causes the least significant bit to
jitter, and the ear averages this jitter over time, allowing you to
encode fractions of a bit essentially. The added noise is more
bearable than the distortion you get otherwise.

The file dither.ps in this directory shows this by looking at the
power spectrum. The first plot is the input signal, a sine wave of
440Hz of low intensity (it occupies the lower 8 bits of a signed 16
bit integer). The x axis shows the 'frequency' in units of samples (I
chose a file length of 10000 samples, so the 440 Hz peak will lie at
440*10000/44100 ~ 100). The y axis is the power (fourier amplitude
squared) in arbitrary units. The second plot shows what happens if the
lower 8 bits are truncated. There are a lot of unwanted spikes at odd
multiples of the base frequency. The third plot shows that adding a
bit of random noise before truncating raises the noise floor a bit,
but gets completely rid of these peaks. A more sophisticated approach
would add dithering noise which has most its power at high
frequencies, where the ear is less susceptiple, rather than simple
random noise.
