Implementing GHC External Core: an experiment with the BNF Converter
--------------------------------------------------------------------

by Aarne Ranta 
6/11/2003

The starting point of the work are the files
ExternalCore.lhs and ParserExternalCore.y contained in the GHC External Core
source package. The goal is to get an abstract syntax as close as possible
to the original. The hope is that External Core is well-behaved, since
it has few concerns for "user-friendly" syntax.


THE PHASES OF THE WORK

Start 11.10, by converting the abstract syntax file to BNF grammar
with nonterminals only. Finished 11.24. Then go through the Happy
file to insert terminals properly. At 12.00 the resulting grammar file,
Core.cf, gets compiled in bnfc, but gets some conflicts.

Aften an hour's lunch break, find the obvious reduce/reduce
conflict between Var and DCon in Exp. Change Var to unqualified
identifiers. Start testing with a hello world -program Hello.hcr, 
generated by

  ghc -fext-core AbsCore.hs Hello.hs

This gets parsed, but some other files don't. The reason turns out
to be that GHC 5.02.2 generates qualified identifiers where the
Core syntax expects qualified ones. Changing this does not quite suffice, 
however, since GHC also generates qualified ones; therefore,
divide the constructor for Vdef into two, VdefQ and VdefU.

At this point, test files get so big (3860 in AbsCore.hcr) that Hugs does 
manage them, so create a compilable test file TopCore, compiled with

  ghc --make -i/home/aarne/BNFC TopCore.hs -o TopCore

Now manage to parse AbsCoreAt at 14.02, after 1 hour's work on grammar writing, 
1 hour on debugging. One reduce/reduce conflict remains. Come back to it
later and locate it in the %forall rule: I had just missed the dot (".") in
the Happy file! The ParCore.info file was useful in locating it, in 5 minutes.

Trying other examples, there is still a problem with string and character 
literals: the standard ones of BNFC do not handle everything that appears in Core.
First experiment with changes in the generated file LexCore.x. Then make it
the proper way, by defining the token types Str and Chr in Core.cf.
At 15.18 manage to parse all my examples with the BNFC-generated parser, the
biggest one being the parser of Core itself:

  wc ParCore.hcr
  64907  143661 1797533 ParCore.hcr

15.18 made the Str and Chr token definitions in Core.cf. Now manage to
parse all examples completely with BNFC-generated parser.

This document was being written as book-keeping while programming.
Some clean-up was done afterwards, and also some comments were added
to Core.cf. Next morning, the document was rewritten to the current shape.


CONCLUSIONS

The work took 1h grammar writing, 1h debugging, 30m fine-tuning.

The resulting grammar parses all tested examples, but the abstract
syntax is slightly different, mostly due to BNFC not having polymorphic
pair and Maybe types. In addition, the original uses some foldr's
as semantic actions, where we just have to retain the lists.

The External Core language is reasonably well-behaved, and the source
files gave good support to the grammar development.

The pretty-printer might be fine-tuned. In particular, the qualifier
dots (but not the %forall dots!) should not be separated by spaces.

It seems straightforward to translate back and forth between the
original syntax and our AbsCore.hs. However, if the External Core
language had been defined in the BNF converter language from the
beginning, this would not be necessary. The generated abstract syntax 
is not very much worse then the hand-written one. There would be
a guaranteed match between the abstract syntax, the parser, the pretty
printer, and the language document - and only a fraction of the current
amount of code and text would have had to be written:

     99     501    2879 Core.cf

instead of

     89     243    1324 ExternalCore.lhs
    240    1042    5168 ParserExternalCore.y
    168     906    4667 PprExternalCore.lhs
    497    2191   11159 total

where the lexer source and the language document are still missing.


REFERENCES

The BNF Converter: http://www.cs.chalmers.se/~markus/BNFC/

GHC External Core: http://www.haskell.org/ghc/docs/papers/core.ps.gz
