GRAMMATICA TODO-LIST
====================

Known Issues
------------

  This is a list of the known issues as of the release of version 0.3 
  (2003-05-04). Please verify that any new problem found is not in 
  this list before sending a bug report.

    o General: Error messages needs improvement
      Error messages printed by Grammatica could be much improved in 
      various ways. First, the faulting line could be printed (as 
      done by javac). Secondly, many of the error texts should be 
      clarified to something more simple to understand. Thirdly, 
      errors in syntetic productions are reported to their syntetic 
      names, which is very confusing and hard to understand.

    o Regular Expressions: Not fully JDK 1.4 compatible
      The regular expression library is still lacking in some ways 
      compared to the implementation in JDK 1.4 (or Perl 5). The 
      library should be extended to support as many constructs as 
      possible.


Suggested Improvements
----------------------

  This is a list of the currently known suggested improvements.

    o Grammar: Add support for modular grammars
      By allowing one grammar to reference another, it would be 
      possible to create more complex modular parsers. This feature 
      also requires support for modular tokenizers and parses, as 
      well as the generation of modular code.

    o Grammar: Add localization support
      This includes finding an architecture that works for grammar 
      files, making it possible to localize the error messages 
      without rewriting the grammar file. This architecture must 
      support different methods depending on generated source 
      (ResourceBundles in Java, Gnu Gettext in C & C++).

    o Grammar: Add support for error recovery
      Error recovery or fallback productions should be possible to 
      add to the grammar. This would make it possible to report 
      various syntax errors. This also requires some sort of error 
      log.

    o Grammar: Identical production should be unified
      Identical syntetic productions are not indentified as such. 
      Instead, a new syntetic production is added in each case. This 
      will cause problems for LR parsers, and is generally 
      inefficient. Identical syntetic productions should be unified.

    o Tokenizer: Speed improvements
      There might still be substantial gains in the tokenizer 
      performance by various optimizations. All the obvious 
      optimizations have already been done, however. The next step is 
      probably to create a DFA.

    o Tokenizer: Add support for modes
      When reading certain tokens the tokenizer should be able to 
      enter some kind of "mode", limiting the set of tokens that are 
      considered for a match. This is highly useful in some grammars, 
      where the token syntax isn't easily represented with regular 
      expressions.

    o Tokenizer: Add support for "hidden" tokens
      This feature could be used to parse source code while 
      maintaining the whitespace tokens accessible in some way, very 
      useful for writing a code style checker.

    o Tokenizer: No unit tests exist
      Validation of the tokenizer is not automated as no formal test 
      suite exists. A test suite should contain both positive and 
      negative tests.

    o Parser: Add support for LR grammars
      This probably means writing at least an LALR(1) parser, and 
      probably also some kind of LR(k) parser.

    o Parser: Add parsing in through code callbacks
      This would allow more complex grammars to be parsed where some 
      productions cannot be expressed in EBNF. Instead a callback 
      method would be called to parse some specific productions.

    o Analyzer: Allow node values propagating downwards
      Improve the analyzer framework to handle node values 
      propagating downwards or sideways in the tree as well.

    o Code Generation: Support creation of a C# parser
      This requires the writing of a C# runtime library.

    o Code Generation: Support creation of a C parser
      This requires the writing of a C runtime library plus 
      appropriate code generation classes.

    o Code Generation: Support creation of a C++ parser
      This requires the writing of a C++ runtime library plus 
      appropriate code generation classes.


_____________________________________________________________________

Grammatica 0.3 (2003-05-04). See http://www.nongnu.org/grammatica for
more information.

Copyright (c) 2003 Per Cederberg. Permission is granted to copy this 
document verbatim in any medium, provided that this copyright notice 
is left intact.
