.. -*- rest -*-

======================
Fortran parser package
======================

:Author:
  Pearu Peterson <pearu.peterson@gmail.com>
:Created: September 2006


.. contents:: Table of Contents

Overview
========

The Fortran parser package is a Python implementation
of Fortran 66/77/90/95/2003 language parser. The code
is under NumPy SVN tree: `numpy/f2py/lib/parser/`__.
The Fortran language syntax rules are defined in `Fortran2003.py`__,
the rules are taken from the following ISO/IEC 1539 working draft:
http://j3-fortran.org/doc/2003_Committee_Draft/04-007.pdf.

__ http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/f2py/lib/parser/
__ http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/f2py/lib/parser/Fortran2003.py

Fortran parser package structure
================================

`numpy.f2py.lib.parser` package contains the following files:

api.py - public API for Fortran parser
--------------------------------------

`This file`__ exposes `Statement` subclasses, `CHAR_BIT` constant, and a function `parse`.

__ http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/f2py/lib/parser/api.py

Function `parse(<input>, ..)` parses, analyzes and returns a `Statement`
tree of Fortran input. For example,

::

  >>> from api import parse
  >>> code = """
  ... c comment
  ...       subroutine foo(a)
  ...       integer a
  ...       print*,"a=",a
  ...       end
  ... """
  >>> tree = parse(code,isfree=False)
  >>> print tree
        !BEGINSOURCE <cStringIO.StringI object at 0xb75ac410> mode=fix90
          SUBROUTINE foo(a)
            INTEGER a
            PRINT *, "a=", a
          END SUBROUTINE foo
  >>>
  >>> tree
        BeginSource
          blocktype='beginsource'
          name='<cStringIO.StringI object at 0xb75ac410> mode=fix90'
          a=AttributeHolder:
        external_subprogram=<dict with keys ['foo']>
          content:
            Subroutine
              args=['a']
              item=Line('subroutine foo(a)',(3, 3),'')
              a=AttributeHolder:
          variables=<dict with keys ['a']>
              content:
                Integer
                  selector=('', '')
                  entity_decls=['a']
                  item=Line('integer a',(4, 4),'')
                Print
                  item=Line('print*,"a=",a',(5, 5),'')
            EndSubroutine
              blocktype='subroutine'
              name='foo'
              item=Line('end',(6, 6),'')

readfortran.py
--------------

`This file`__ contains tools for reading Fortran codes from file and string objects.

__ http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/f2py/lib/parser/readfortran.py

To read Fortran code from a file, use `FortranFileReader` class.
`FortranFileReader` class is iterator over Fortran code lines
and is derived from `FortranReaderBase` class.
It automatically handles the line continuations and comments, as
well as it detects if Fortran file is in the free or fixed format.

For example,

::

  >>> from readfortran import *
  >>> import os
  >>> reader = FortranFileReader(os.path.expanduser('~/src/blas/daxpy.f'))
  >>> reader.next()
  Line('subroutine daxpy(n,da,dx,incx,dy,incy)',(1, 1),'')
  >>> reader.next()
  Comment('c     constant times a vector plus a vector.\nc     uses unrolled loops for increments equal to one.\nc     jack dongarra, linpack, 3/11/78.\nc     modified 12/3/93, array(1) declarations changed to array(*)',(3, 6))
  >>> reader.next()
  Line('double precision dx(*),dy(*),da',(8, 8),'')
  >>> reader.next()
  Line('integer i,incx,incy,ix,iy,m,mp1,n',(9, 9),'')

Note that `FortranReaderBase.next()` method may return `Line`, `SyntaxErrorLine`, `Comment`, `MultiLine`,
`SyntaxErrorMultiLine` instances.

`Line` instance has the following attributes:

  * `.line` - contains Fortran code line
  * `.span` - a 2-tuple containing the span of line numbers containing
    Fortran code in the original Fortran file
  * `.label` - the label of Fortran code line
  * `.reader` - the `FortranReaderBase` class instance
  * `.strline` - if it is not `None` then it contains Fortran code line with parenthesis
    content and string literal constants saved in the `.strlinemap` dictionary.
  * `.is_f2py_directive` - `True` if line starts with the f2py directive comment.

and the following methods:

  * `.get_line()` - returns `.strline` (also evalutes it if None). Also
    handles Hollerith contstants in the fixed F77 mode.
  * `.isempty()`  - returns `True` if Fortran line contains no code.
  * `.copy(line=None, apply_map=False)` - returns a `Line` instance
    with given `.span`, `.label`, `.reader` information but the line content
    replaced with `line` (when not `None`) and applying `.strlinemap`
    mapping (when `apply_map` is `True`).
  * `.apply_map(line)` - apply `.strlinemap` mapping to line content.
  * `.has_map()` - returns `True` if `.strlinemap` mapping exists.

For example,

::

  >>> item = reader.next()
  >>> item
  Line('if(n.le.0)return',(11, 11),'')
  >>> item.line
  'if(n.le.0)return'
  >>> item.strline
  'if(F2PY_EXPR_TUPLE_4)return'
  >>> item.strlinemap
  {'F2PY_EXPR_TUPLE_4': 'n.le.0'}
  >>> item.label
  ''
  >>> item.span 
  (11, 11)
  >>> item.get_line()
  'if(F2PY_EXPR_TUPLE_4)return'
  >>> item.copy('if(F2PY_EXPR_TUPLE_4)pause',True)
  Line('if(n.le.0)pause',(11, 11),'')

`Comment` instance has the following attributes:

  * `.comment` - a comment string
  * `.span` - a 2-tuple containing the span of line numbers containing
    Fortran comment in the original Fortran file
  * `.reader` - the `FortranReaderBase` class instance

and `.isempty()` method.

`MultiLine` class represents multiline syntax in the .pyf files::

  <prefix>'''<lines>'''<suffix>

`MultiLine` instance has the following attributes:

  * `.prefix` - the content of `<prefix>`
  * `.block` - a list of lines
  * `.suffix` - the content of `<suffix>`
  * `.span` - a 2-tuple containing the span of line numbers containing
    multiline syntax in the original Fortran file
  * `.reader` - the `FortranReaderBase` class instance

and `.isempty()` method.

`SyntaxErrorLine` and `SyntaxErrorMultiLine` are like `Line` and `MultiLine`
classes, respectively, with a functionality of issuing an error
message to `sys.stdout` when constructing an instance of the corresponding
class.

To read a Fortran code from a string, use `FortranStringReader` class::

  reader = FortranStringReader(<string>, <isfree>, <isstrict>)

where the second and third arguments are used to specify the format
of the given `<string>` content. When `<isfree>` and `<isstrict>` are both
`True`, the content of a .pyf file is assumed. For example,

::

  >>> code = """                       
  ... c      comment
  ...       subroutine foo(a)
  ...       print*, "a=",a
  ...       end
  ... """
  >>> reader = FortranStringReader(code, False, True)
  >>> reader.next()
  Comment('c      comment',(2, 2))
  >>> reader.next()
  Line('subroutine foo(a)',(3, 3),'')
  >>> reader.next()
  Line('print*, "a=",a',(4, 4),'')
  >>> reader.next()
  Line('end',(5, 5),'')

`FortranReaderBase` has the following attributes:

  * `.source` - a file-like object with `.next()` method to retrive 
    a source code line
  * `.source_lines` - a list of read source lines
  * `.reader` - a `FortranReaderBase` instance for reading files
    from INCLUDE statements.
  * `.include_dirs` - a list of directories where INCLUDE files
    are searched. Default is `['.']`.

and the following methods:

  * `.set_mode(isfree, isstrict)` - set Fortran code format information
  * `.close_source()` - called when `.next()` raises `StopIteration` exception.

parsefortran.py
---------------

`This file`__ contains code for parsing Fortran code from `FortranReaderBase` iterator.

__ http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/f2py/lib/parser/parsefortran.py

`FortranParser` class holds the parser information while
iterating over items returned by `FortranReaderBase` iterator.
The parsing information, collected when calling `.parse()` method,
is saved in `.block` attribute as an instance
of `BeginSource` class defined in `block_statements.py` file.

For example,

::

  >>> reader = FortranStringReader(code, False, True)
  >>> parser = FortranParser(reader)
  >>> parser.parse()
  >>> print parser.block
        !BEGINSOURCE <cStringIO.StringI object at 0xb751d500> mode=fix77
          SUBROUTINE foo(a)
            PRINT *, "a=", a
          END SUBROUTINE foo

Files `block_statements.py`__, `base_classes.py`__, `typedecl_statements.py`__, `statements.py`__
-------------------------------------------------------------------------------------------------

__ http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/f2py/lib/parser/block_statements.py
__ http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/f2py/lib/parser/base_classes.py
__ http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/f2py/lib/parser/typedecl_statements.py
__ http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/f2py/lib/parser/statements.py

The model for representing Fortran code statements consists of a tree of `Statement`
classes defined in `base_classes.py`. There are two types of statements: one-line
statements and block statements. Block statements consists of start and end
statements, and content statements in between that can be of both types again.

`Statement` instance has the following attributes:

  * `.parent`  - it is either parent block-type statement or `FortranParser` instance.
  * `.item`    - a `Line` instance containing Fortran statement line information, see above.
  * `.isvalid` - when `False` then processing this `Statement` instance will be skipped,
    for example, when the content of `.item` does not match with
    the `Statement` class.
  * `.ignore`  - when `True` then the `Statement` instance will be ignored.
  * `.modes`   - a list of Fortran format modes where the `Statement` instance is valid.

and the following methods:

  * `.info(message)`, `.warning(message)`, `.error(message)` - to spit out messages to
    `sys.stderr` stream.
  * `.get_variable(name)` - get `Variable` instance by name that is defined in
    current namespace. If name is not defined, then the corresponding
    `Variable` instance is created.
  * `.analyze()` - calculate various information about the `Statement`, this information
    is saved in `.a` attribute that is `AttributeHolder` instance.

All statement classes are derived from the `Statement` class. Block statements are
derived from the `BeginStatement` class and is assumed to end with an `EndStatement`
instance in `.content` attribute list. `BeginStatement` and `EndStatement` instances
have the following attributes:

  * `.name`      - name of the block, blocks without names use line label
    as the name.
  * `.blocktype` - type of the block (derived from class name)
  * `.content`   - a list of `Statement` (or `Line`) instances.

and the following methods:

  * `.__str__()` - returns string representation of Fortran code.

A number of statements may declare a variable that is used in other
statement expressions. Variables are represented via `Variable` class
and its instances have the following attributes:

  * `.name`      - name of the variable
  * `.typedecl`  - type declaration
  * `.dimension` - list of dimensions
  * `.bounds`    - list of bounds
  * `.length`    - length specs
  * `.attributes` - list of attributes
  * `.bind`      - list of bind information
  * `.intent`    - list of intent information
  * `.check`     - list of check expressions
  * `.init`      - initial value of the variable
  * `.parent`    - statement instance declaring the variable
  * `.parents`   - list of statements that specify variable information

and the following methods:

  * `.is_private()`
  * `.is_public()`
  * `.is_allocatable()`
  * `.is_external()`
  * `.is_intrinsic()`
  * `.is_parameter()`
  * `.is_optional()`
  * `.is_required()`

The following type declaration statements are defined in `typedecl_statements.py`:

  `Integer`, `Real`, `DoublePrecision`, `Complex`, `DoubleComplex`, `Logical`,
  `Character`, `Byte`, `Type`, `Class`

and they have the following attributes:

  * `.selector`           - contains lenght and kind specs
  * `.entity_decls`, `.attrspec`

and methods:

  * `.tostr()` - return string representation of Fortran type declaration
  * `.astypedecl()` - pure type declaration instance, it has no `.entity_decls`
    and `.attrspec`.
  * `.analyze()` - processes `.entity_decls` and `.attrspec` attributes and adds
    `Variable` instance to `.parent.a.variables` dictionary.

The following block statements are defined in `block_statements.py`:

  `BeginSource`, `Module`, `PythonModule`, `Program`, `BlockData`, `Interface`,
  `Subroutine`, `Function`, `Select`, `Where`, `Forall`, `IfThen`, `If`, `Do`,
  `Associate`, `TypeDecl (Type)`, `Enum`

Block statement classes may have different properties which are declared via
deriving them from the following classes:

  `HasImplicitStmt`, `HasUseStmt`, `HasVariables`, `HasTypeDecls`,
  `HasAttributes`, `HasModuleProcedures`, `ProgramBlock`

In summary, the `.a` attribute may hold different information sets as follows:

  * `BeginSource` - `.module`, `.external_subprogram`, `.blockdata`
  * `Module` - `.attributes`, `.implicit_rules`, `.use`, `.use_provides`, `.variables`,
    `.type_decls`, `.module_subprogram`, `.module_data`
  * `PythonModule` - `.implicit_rules`, `.use`, `.use_provides`
  * `Program` - `.attributes`, `.implicit_rules`, `.use`, `.use_provides`
  * `BlockData` - `.implicit_rules`, `.use`, `.use_provides`, `.variables`
  * `Interface` - `.implicit_rules`, `.use`, `.use_provides`, `.module_procedures`
  * `Function`, `Subroutine` - `.implicit_rules`, `.attributes`, `.use`, `.use_statements`,
    `.variables`, `.type_decls`, `.internal_subprogram`
  * `TypeDecl` - `.variables`, `.attributes`

Block statements have the following methods:

  * `.get_classes()` - returns a list of `Statement` classes that are valid
    as a content of given block statement.

The following one line statements are defined:

  `Implicit`, `TypeDeclarationStatement` derivatives (see above),
  `Assignment`, `PointerAssignment`, `Assign`, `Call`, `Goto`, `ComputedGoto`,
  `AssignedGoto`, `Continue`, `Return`, `Stop`, `Print`, `Read`, `Write`, `Flush`,
  `Wait`, `Contains`, `Allocate`, `Deallocate`, `ModuleProcedure`, `Access`,
  `Public`, `Private`, `Close`, `Cycle`, `Backspace`, `Endfile`, `Reeinf`, `Open`,
  `Format`, `Save`, `Data`, `Nullify`, `Use`, `Exit`, `Parameter`, `Equivalence`,
  `Dimension`, `Target`, `Pointer`, `Protected`, `Volatile`, `Value`,
  `ArithmeticIf`, `Intrinsic`, `Inquire`, `Sequence`, `External`, `Namelist`,
  `Common`, `Optional`, `Intent`, `Entry`, `Import`, `Forall`,
  `SpecificBinding`, `GenericBinding`, `FinalBinding`, `Allocatable`,
  `Asynchronous`, `Bind`, `Else`, `ElseIf`, `Case`, `Where`, `ElseWhere`,
  `Enumerator`, `FortranName`, `Threadsafe`, `Depend`, `Check`,
  `CallStatement`, `CallProtoArgument`, `Pause`

