GRAMMATICA TODO-LIST
====================

1. Introduction
---------------

  This document contains a list of the currently known issues and 
  suggested improvements to this package. Please verify that any new 
  problem found is not in this list before sending a bug report.


2. Known Issues
---------------

  o General: Error messages could be much improved by adding the
    faulting line (as done by javac).

  o Regular Expressions: The regular expression library is still
    lacking in some ways compared to the implementation in JDK 1.4 
    (or Perl 5). The library should be extended to support as many
    constructs as possible.

  o Parser: Some ambiguities inside productions cannot currently be
    resolved by the LL(k) parser. Consider the following production:
    
      Prod = ["one"] "one" "two" ;

    An ambiguity exists between the first element and the second, but
    it is not inherent. The parser is, however, unable to resolve 
    this ambiguity, as it currently doesn't store look-ahead lists 
    for elements inside productions. The above production can be 
    rewritten, though, so that the parser handles it correctly:
    
      Prod = "one" "two"
           | "one" "one" "two" ;
  
  o Parser: Some erroneous grammars may cause infinite loops. The 
    grammar look-ahead analysis is currently not written to support
    all types of erroneous grammars. In some cases, the parser may
    enter an infinite look while attempting to determine the correct
    number of look-ahead tokens for each production.

  o Parser: Parse errors (exception) messages do not convey all the
    details they should. Specifically, the "unexpected token" errors
    should also contain a list of the tokens that were expected.

  o Grammar Analyzer: Identical sub-productions in different 
    productions are not indentified as such. Instead, a new sub-
    production is added for each parenthesized production in the
    grammar. This will cause errors in LALR(1) parsers, and is
    generally inefficient. The identical sub-productions should 
    instead be unified.
    

3. Suggested Improvements
-------------------------

  o Tokenizer: Improve speed. There might still be substantial gains
    in the tokenizer performance by various optimizations. All the
    obvious optimizations have already been done, however.

  o Tokenizer: Add support for modes. When reading certain tokens the 
    tokenizer should be able to enter some kind of "mode", limiting
    the set of tokens that are considered for a match. This is highly
    useful in some grammars, where the token syntax isn't easily 
    represented with a single regular expression.

  o Tokenizer: Add support for "hidden" tokens. This feature could be 
    used to parse source code while maintaining the whitespace tokens 
    accessible in some way, very useful for writing a code style
    checker.

  o Parser: Add support for error recovery and error logging.

  o Parser: Write an LALR(1) parser.

  o Parser: Write an LR(n) parser.

  o Parser: Add parser callbacks, allowing complex grammars to be
    parsed by source code instead of the parser.
    
  o Analyzer: Improve the analyzer framework to handle node values
    propagating downwards or sideways in the tree as well.

  o Code Generation: Allow creation of a C# parser. This requires the
    writing of a C# runtime library.
  
  o Code Generation: Allow creation of a C parser. This requires the
    writing of a C runtime library plus appropriate code generation
    classes.
  
  o Code Generation: Allow creation of a C++ parser. This requires the
    writing of a C++ runtime library plus appropriate code generation
    classes.
  
  o Add localization support for error and warning messages. This 
    includes finding an architecture that works for grammar files, 
    making it possible to localize the error messages without 
    rewriting the grammar file. This architecture must support 
    different methods depending on generated source (ResourceBundles 
    in Java, Gnu Gettext in C & C++).

  o Add support for modular grammars and lexers. This would make it
    possible to reference another grammar from within a grammar,
    creating multiple parsers at a time. The generated code must of
    course also be modular.

_____________________________________________________________________

Copyright (c) 2003 Per Cederberg. Permission is granted to copy this 
document verbatim in any medium, provided that this copyright notice 
is left intact.
