Parser.txt
==========

Design documentation for Biopython parsers.



Design Overview
---------------

Parsers are built around an event-oriented design that includes
Scanner and Consumer objects.

Scanners take input from a data source and analyze it line by line,
sending off an event whenever it recognizes some information in the
data.  For example, if the data includes information about an organism
name, the scanner may generate an "organism_name" event whenever it
encounters a line containing the name.

Consumers are objects that receive the events generated by Scanners.
Following the previous example, the consumer receives the
"organism_name" event, and the processes it in whatever manner
necessary in the current application.


Events
------

There are two types of events: info events that tag the location of
information within a data stream, and section events that mark
sections within a stream.  Info events are associated with specific
lines within the data, while section events are not.

Section event names must be in the format start_EVENTNAME and
end_EVENTNAME where EVENTNAME is the name of the event.

For example, a FASTA-formatted sequence scanner may generate the
following events:
EVENT NAME      ORIGINAL INPUT
begin_sequence  
title           >gi|132871|sp|P19947|RL30_BACSU 50S RIBOSOMAL PROTEIN L30 (BL27
sequence        MAKLEITLKRSVIGRPEDQRVTVRTLGLKKTNQTVVHEDNAAIRGMINKVSHLVSVKEQ
end_sequence
begin_sequence
title           >gi|132679|sp|P19946|RL15_BACSU 50S RIBOSOMAL PROTEIN L15
sequence        MKLHELKPSEGSRKTRNRVGRGIGSGNGKTAGKGHKGQNARSGGGVRPGFEGGQMPLFQRLPK
sequence        RKEYAVVNLDKLNGFAEGTEVTPELLLETGVISKLNAGVKILGNGKLEKKLTVKANKFSASAK
sequence        GTAEVI
end_sequence
[...]

(I cut the lines shorter so they'd look nicer in my editor).

The FASTA scanner generated the following events: 'title', 'sequence',
'begin_sequence', and 'end_sequence'.  Note that the 'begin_sequence'
and 'end_sequence' events are not associated with any line in the
original input.  They are used to delineate separate sequences within
the file.

The events a scanner can send must be specifically defined for each
data format.



'noevent' EVENT
-----------------

A data file can contain lines that have no meaningful information,
such as blank lines.  By convention, a scanner should generate the
"noevent" event for these lines.



Scanners
--------

class Scanner:
    def feed(self, handle, consumer):
        # Implementation


Scanners should implement a method named 'feed' that takes a file
handle and a consumer.  The scanner should read data from the file
handle and generate appropriate events for the consumer.



Consumers
---------

class Consumer:
    # event handlers


Consumers contain methods that handle events.  The name of the method
is the event that it handles.  Info events are passed the line of the
data containing the information, and section events are passed
nothing.

You are free to ignore events that are not interesting for your
application.  You should just not implement methods for those events.

All consumers should be derived from the base Consumer class.

An example:

class FASTAConsumer(Consumer):
    def title(self, line):
        # do something with the title
    def sequence(self, line):
        # do something with the sequence
    def begin_sequence(self):
        # a new sequence starts
    def end_sequence(self):
        # a sequence ends
