===================================
PyPy - Just-In-Time Specialization
===================================


Warning: These are just a few notes quickly thrown together, to be
clarified and expanded.

Draft
=========================

Introduction
------------

Our current "state of the art" in the area is represented by the test
``pypy.jit.test.test_hint_timeshift.test_arith_plus_minus()``.  It is a
really tiny interpreter which gets turned into a compiler.  Here is its
complete source::

    def ll_plus_minus(encoded_insn, nb_insn, x, y):
        acc = x
        pc = 0
        while pc < nb_insn:
            op = (encoded_insn >> (pc*4)) & 0xF
            op = hint(op, concrete=True)           # <-----
            if op == 0xA:
                acc += y
            elif op == 0x5:
                acc -= y
            pc += 1
        return acc

This interpreter goes via several transformations which you can follow
in Pygame::

    py.test test_hint_timeshift.py -k test_arith_plus_minus --view

What this does is turning (the graph of) the above interpreter into (the
graph of) a compiler.  This compiler takes a user program as input --
i.e. an ``encoded_insn`` and ``nb_insn`` -- and produces as output a new
graph, which is the compiled version of the user program.  The new
output graph is called the *residual* graph.  It takes ``x`` and ``y``
as input.

This generated compiler is not "just-in-time" in any sense.  It is a
regular compiler, for now.

Hint-annotator
--------------

Let's follow how the interpreter is turned into a compiler.  First, the
source of the interpreter is turned into low-level graphs in the usual
way (it is actually already a low-level function).  This low-level graph
goes then through a pass called the "hint-annotator", whose goal is to
give colors -- red, green and blue -- to each variable in the graph.
The color represents the "time" at which the value of a variable is
expected to be known, when the interpreter works as a compiler.  In the
above example, variables like ``pc`` and ``encoded_insn`` need to be
known to the compiler -- otherwise, it wouldn't even know which program
it must compile.  These variables need to be *green*.  Variables like
``x`` and ``acc``, on the other hand, are expected to appear in the
residual graph; they need to be *red*.

The color of each variable is derived based on the hint (see the
``<-----`` line in the source): the hint() forces ``op`` to be a green
variable, along with any *previous* variable that is essential to
compute the value of ``op``.  The hint-annotator computes dependencies
and back-propagates "greenness" from hint() calls.

The hint-annotator is implemented on top of the normal annotator; it's
in hintannotator.py, hintmodel.py, hintbookkeeper.py, hintcontainer.py
and hintvlist.py.  The latter two files are concerned about the *blue*
variables, which are variable that contain pointers to structures of a
mixed kind: structures which are themselves -- as containers -- known to
the compiler, i.e. green, but whose fields may not all be known to the
compiler.  There is no blue variable in the code above, but the stack of
a stack-based interpreter is an example: the "shape" of the stack is
known to the compiler when compiling any bytecode position, but the
actual run-time values in the stack are not.  The hint-annotator can now
handle many cases of blue structures and arrays.  For low-level
structures and arrays that actually correspond to RPython lists,
hintvlist.py recognize the RPython-level operations and handles them
directly -- this avoids problems with low-level details like
over-allocation, which causes several identical RPython lists to look
different when represented as low-level structs and arrays.

Timeshifter
-----------

Once the graph has been colored, enters the "timeshifter".  This
tool -- loosely based on the normal RTyper -- transforms the colored
low-level graphs into the graphs of the compiler.  Broadly speaking,
this is done by transforming operations annotated with red variables
into operations that will generate the original operation.  Indeed, red
variables are the variables whose run-time content is unknown to the
compiler.  So for example, if the arguments of an ``int_add`` have been
annotated as red, it means that the real value of these variables will
not be known to the compiler; when the compiler actually runs, all it
can do is generate a new ``int_add`` operation into the residual graph.

In the example above, only ``acc += y`` and ``acc -= y`` are annotated
with red arguments.  After hint-rtyping, the ll_plus_minus() graph --
which is now the graph of a compiler -- is mostly unchanged except for
these two operations, which are replaced by a few operations which call
helpers; when the graph of the now-compiler is running, these helpers
will produce new ``int_add`` and ``int_sub`` operations.

Merging and bookkeeping
-----------------------

XXX the following is not the way it works currently, but rather a
proposal for how it might work -- although it's all open to changes.  It
is a mix of how Psyco and the Flow Object Space work.

Unlike ll_plus_minus() above, any realistic interpreter needs to handle
two complications:

1. Jumping or looping opcodes.  When looping back to a previous bytecode
   position, the now-compiler must not continue to compile again and again;
   it must generate a loop in the residual graph as well, and stop
   compiling.

2. Conditional branches.  Unless the condition is known at compile time,
   the compiler must generate a branch in the residual graph and compile
   both paths.

Keep in mind that this is all about how the interpreter is transformed
to become a compiler.  Unless explicitly specified, I don't speak about
the interpreted user program here.

As a reminder, this is handled in the Flow Space as follows:

1. When a residual operation is about to be generated, we check if the
   bytecode position closed back to an already-seen position.  To do so,
   for each bytecode position we save a "state".  The state remembers
   the bytecode interpreter's frame state, as a pattern of Variables and
   Constants; in addition, the state points to the residual block that
   corresponded to that position.

2. A branch forces the current block to fork in two "EggBlocks".  This
   makes a tree with a "SpamBlock" at the root and two EggBlock children,
   which themselves might again have two EggBlock children, and so on.
   The Flow Space resumes its analysis from the start of the root
   SpamBlock and explores each branch of this tree in turn.  Unexplored
   branches are remembered for later.

In Psyco:

1. A state is saved at the beginning of each bytecode, as in the Flow
   Space.  (There are actually memory-saving tricks to avoid saving a
   state for each and every bytecode.)  We close a loop in the residual
   graph as soon as we reach an already-seen bytecode position, but
   only if the states are "compatible enough".  Indeed, as in the PyPy
   JIT, Psyco states store more than just Variable/Constant: they store
   "blue" containers detailing individual field's content.  Blue
   containers of too-different shapes are not "compatible enough".  In
   addition, some Constants can be marked as "fixed" to prevent them
   from being merged with different Constants and becoming Variables.

2. Branching in Psyco work as in the Flow Space, with the exception that
   each condition has got a "likely" and a "less likely" outcome.  The
   compilation follows the "likely" path only.  The "unlikely" paths
   are only explored if at run-time the execution actually reaches them.
   (When it does, the same trick as in the Flow Space is used:
   compilation restarts from the root SpamBlock and follows the complete
   branch of the EggBlock tree.)

3. A different kind of branching that doesn't occur in the Flow Space:
   promoting a value from Variable to Constant.  This is used e.g. when
   an indirect function call is about to be performed.  A typical
   example is to call a PyTypeObject's slot based on the type of a
   PyObject instance.  In this case, Psyco considers the PyObject's
   ob_type field as a Variable, which it turns into a Constant.
   Conceptually, the current residual block is ended with a "switch"
   and every time a different run-time value reaches this point, a new
   case is compiled and added to the switch.  (As in 2., the compilation
   is restarted from the root SpamBlock until it reaches that point
   again.)

The "tree of EggBlocks" approach doesn't work too well in general.  For
example, it unrolls loops infinitely if they are not loops in the
bytecode but loops in the implementation of a single opcode (we had this
problem working on the annotator in Vilnius).

The current preliminary work on the timeshifter turns the interpreter
into a compiler that saves its state at _all_ join-points permanently.  This
makes sure that loops are not unexpectedly unrolled, and that the code
that follows an if/else is not duplicated (as it would be in the
tree-of-EggBlocks approach).  It is also inefficient and
perfect to explode the memory usage.

I think we could try to target the following model -- which also has the
advantage that simple calls in the interpreter are still simple calls in
the compiler, as in Psyco:

1. We only save the state at one point.  We probably need a hint to
   specify where this point is -- at the beginning of the bytecode
   interpreter loop.  It is easy to know in advance which information
   needs to be stored in each state: the tuple of green variables is used
   as a key in a global dict; the content of the red variables is stored
   as a state under this key.  Several states can be stored under the same
   key if they are not "compatible enough".

2. Branching: a possible approach would be to try to have the compiler
   produce a residual graph that has the same shape as the original
   graph in the interpreter, at least as far as the conditions are
   unknown at compile-time.  (This would remove the branches whose
   conditions are known at compile time, and unroll all-green loops
   like the bytecode interpreter loop itself.)

3. Promoting a Variable to a Constant, or, in colored terms, a red
   variable to a green one (using a hint which we have not implemented
   so far): this is the case where we have no choice but suspend the
   compilation, and wait for execution to provide real values.  We will
   implement this case later, but I mention it now because it seems that
   there are solutions compatible with this model, including Psyco's
   (see 3. above).

The motivation to do point 2. differently than in Psyco is that it is
both more powerful (no extra unrolling/duplication of code) and closer
to what we have already now: the bookkeeping code inserted by the
timeshifter in the compiler's graphs.  In Psyco it would have been a
mess to write that bookkeeping code everywhere by hand, not to mention
changing it to experiment with other ideas.

Random notes
-----------------

An idea to consider: red variables in the compiler could come with
a concrete value attached too, which represents a real execution-time
value.  The compiler would perform concrete operations on it in addition
to generating residual operations.  In other words, the compiler would
also perform directly some interpretation as it goes along.  In this
way, we can avoid some of the recompilation by using this attached value
e.g. as the first switch case in the red-to-green promotions, or as a
hint about which outcome of a run-time condition is more likely.

Backends
----------------


.. include:: ../_ref.txt
