
   BioPerl DocBook

Bio::SeqIO HOWTO

Ewan Birney

   EBI

   <[1]birney-at-ebi.ac.uk>

Darinn London

   EBI

   <[2]dlondon-at-ebi.ac.uk>

Brian Osborne

   Cognia Corporation

   <[3]brian-at-cognia.com>

   This document is copyright Ewan Birney, 2002. For reproduction other
   than personal use please contact me at birney-at-ebi.ac.uk

   Abstract

   This HOWTO tries to teach you about the SeqIO system for reading and
   writing sequences of various formats
     _________________________________________________________________

   Table of Contents

   1. [4]The basics
   2. [5]10 second overview
   3. [6]Background Information
   4. [7]Formats
   5. [8]Working Examples
   6. [9]To and from a string
   7. [10]And more examples...
   8. [11]Caveats
   9. [12]Error Handling

1. The basics

   This section assumes you've never seen Bioperl before, perhaps you're
   a biologist trying to get some information about some sequences or
   you're some kind of IT expert interested in learning something about
   this hot topic, "bioinformatics". Your first script may want to get
   some information from a file containing 1 or more sequences.

   A piece of advice: always use the module Bio::SeqIO! Here's what the
   first lines of your script might look like:

#!/bin/perl

use Bio::SeqIO;
my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;


   Why use Bio::SeqIO? In part because SeqIO understands the many
   different sequence file formats and creates the proper Bioperl object
   for each format. Some formats, like "fasta", are minimal. The fasta
   format contains a sequence and some kind of identifier, but nothing
   else is required nor does the format inherently allow for much more
   detail, like a feature (a sub-sequence, usually with some biological
   property - see the [13]Feature-Annotation HOWTO for more information).
   When given fasta SeqIO creates a Bio::Seq object, a more spare object
   than the Bio::Seq::RichSeq object that's created when SeqIO is given
   formats like Genbank or EMBL, which may contain features and
   annotations.

   Now, should you care what kind of Bioperl object is created by SeqIO?
   For the most part no, let SeqIO take care of these details.

2. 10 second overview

   Lots of bioinformatics involves processing sequence information in
   different formats - indeed, there often seems to be about as many
   formats as there are programs for processing them. The Bio::SeqIO
   system handles sequences of many different formats and is the way
   Bioperl pushes sequences in and out of objects. You can think of the
   Bio::SeqIO system as "a smart filehandle for sequences".

3. Background Information

   The SeqIO system handles all of the complexity of parsing sequences of
   many standard formats that scripters have wrestled with over the
   years. Given some way of accessing some sequences (flat files, STDIN
   and STDOUT, variables, etc.), and a format description, it provides
   access to a stream of SeqI objects tailored to the information
   provided by the format. The format description is, technically,
   optional. SeqIO can try to guess based on known file extensions or
   content, but if your source doesn't have a standard file extension or
   comprehensible content, or isn't even a file at all, it throws up its
   hands and tries fasta. Unless you are always working with FASTA files,
   it is a good idea to get into the practice of always specifying the
   format.

   Sequences can be fed into the SeqIO system in a variety of ways. The
   only requirement is that the sequence be contained in some kind of
   standard perl 'Handle'. Most people will make use of the traditional
   handles: file handles, and STDIN/STDOUT. However, perl provides ways
   of turning the contents of a string into a Handle as well (more on
   this below), so just about anything can be fed into SeqIO to get at
   the sequence information contained within it. What SeqIO does is
   create a Handle, or take a given Handle, and parse it into SeqI
   compliant objects, one for each entry at a time. It also knows, for
   each of the supported formats, things like which record-separator
   (e.g. "//" for Genbank, ">header" for fasta, etc.) to use, and most
   importantly, how to parse their key-value based information. SeqIO
   does all of this for you, so that you can focus on the things you want
   to do with the information, rather than worrying about how to get the
   information.

4. Formats

   Bioperl's SeqIO system understands lot of formats and can interconvert
   all of them. Here is a current listing of formats, as of version 1.4.

      Name            Description                  extensions
   abi        ABI tracefile                ab[i1]
   ace        Ace database                 ace
   agave      AGAVE XML                   
   alf        ALF tracefile                alf
   bsml                                    bsm|bsml
   chado      Chado                       
   ctf        CTF tracefile                ctf
   embl       EMBL database                embl|ebl|emb|dat
   exp        EXP format                   exp
   fasta      FASTA                        fasta|fast|seq|fa|fsa|nt|aa
   fastq                                   fastq
   game       GAME XML                    
   gcg        GCG                          gcg
   genbank    GenBank                      gb|gbank|genbank
   interpro   InterProScan XML            
   kegg       KEGG                        
   largefasta                           
   phd        Phred                        phd|phred
   pir        PIR database                 pir
   pln        PLN tracefile                pln
   qual       Phred                       
   raw        plain text                   txt
   scf        Standard Chromatogram Format scf
   swiss      SwissProt                    swiss|sp
   tab        tab-delimited               
   tigr       TIGR XML                    
   tinyseq    NCBI TinySeq XML            
   ztr        ZTR tracefile                ztr

   Table 1. Formats
   [Note] Note

   SeqIO needs the bioperl-ext package to read the scf, abi, alf, pln,
   exp, ctf, and ztr formats.

   For some one of the initial perplexities of Bioperl is the variety of
   different sequence objects, and this gives rise to questions like "How
   do I convert a PrimarySeq object into a RichSeq object?". The answer
   is that one should never have to do this, SeqIO takes care of all
   these conversions. The reason for these different objects in the first
   place has to with the information, or lack of information, inherent to
   the different file formats. Though we just said that the conversions
   are done automatically we offer this table that shows some common
   formats and their corresponding object types, just to satisfy the
   curious.

   Format     Object type
   fasta   Bio::Seq
   genbank Bio::Seq::RichSeq
   pir     Bio::Seq
   embl    Bio::Seq::RichSeq
   raw     Bio::Seq
   ace     Bio::PrimarySeq
   bsml    Bio::Seq::RichSeq
   swiss   Bio::Seq::RichSeq

   Table 2. Formats and Object

5. Working Examples

   The simplest script for parsing sequence files is written out below.
   It prints out the accession number for each entry in the file.

     # first, bring in the SeqIO module

     use Bio::SeqIO;

     # Notice that you do not have to use any Bio:SeqI
     # objects, because SeqIO does this for you. In fact, it
     # even knows which SeqI object to use for the provided
     # format.

     # Bring in the file and format, or die with a nice
     # usage statement if one or both arguments are missing.
     my $usage = "getaccs.pl file format\n";
     my $file = shift or die $usage;
     my $format = shift or die $usage;

     # Now create a new SeqIO object to bring in the input
     # file. The new method takes arguments in the format
     # key => value, key => value. The basic keys that it
     # can accept values for are '-file' which expects some
     # information on how to access your data, and '-format'
     # which expects one of the Bioperl-format-labels mentioned
     # above. Although it is optional, it is good
     # programming practice to provide > and < in front of any
     # filenames provided in the -file parameter. This makes the
     # resulting filehandle created by SeqIO explicitly read (<)
     # or write(>).  It will definitely help others reading your
     # code understand the function of the SeqIO object.

     my $inseq = Bio::SeqIO->new('-file'   => "<$file",
                                 '-format' => $format );
     # Now that we have a seq stream,
     # we need to tell it to give us a $seq.
     # We do this using the 'next_seq' method of SeqIO.

     while (my $seq = $inseq->next_seq) {
           print $seq->accession_number,"\n";
     }
     exit;


   This script takes two arguments on the commandline, and input filename
   and the format of the input file. This is the basic way to access the
   data in a Genbank file. It is the same for fasta, swissprot, ace, and
   all the others as well, provided that the correct Bioperl-format-label
   is provided.

   Notice that SeqIO naturally works over sets of sequences in files, not
   just one sequence. Each call to next_seq will return the next sequence
   in the 'stream', or null if the end of the file/stream has been
   reached. This allows you to read in the contents of your data one
   sequence at a time, which conserves memory, in contrast with pulling
   everything into memory first. The null that is returned at the end of
   file/stream is important, as it allows you to wrap successive calls to
   next_seq in a while loop. This code snippet would load up all the
   sequences in a EMBL file into an array:

     use strict;
     use Bio::SeqIO;

     my $input_file = shift;

     my $seq_in  = Bio::SeqIO->new( -format => 'embl',
                                    -file => $input_file);

     # loads the whole file into memory - be careful
     # if this is a big file, then this script will
     # use a lot of memory

     my $seq;
     my @seq_array;
     while( $seq = $seq_in->next_seq() ) {
        push(@seq_array,$seq);
     }

     # now do something with these. First sort by length,
     # find the average and median lengths and print them out

     @seq_array = sort { $a->length <=> $b->length } @seq_array;

     my $total = 0;
     my $count = 0;
     foreach my $seq ( @seq_array ) {
        $total += $seq->length;
        $count++;
     }

     print "Mean length ",$total/$count," Median ",$seq_array[$count/2],"\n";


   Now, what if we want to convert one format to another? When you create
   a Bio::SeqIO object to read in a flat file, the magic behind the
   curtains is that each call to 'next_seq' is a complex parsing of the
   next sequence record into a SeqI object - not a single line, but the
   entire record!! It knows when to start parsing, and when to stop and
   wait for the next call to next_seq. It knows how to get at the
   DIVISION information stored on the LOCUS line, etc. To get that SeqI
   information back out to a new file, of a different format (or of the
   same format, but with sequences grouped in a new way), Bio::SeqIO has
   a second method, called 'write_seq' that revearses the process done by
   next_seq. It knows how to write all of the data contained in the SeqI
   object into the right places, with the correct labels, for any of the
   supported formats. Let's make this more concrete by writing a
   universal format translator:


          use Bio::SeqIO;
          # get command-line arguments, or die with a usage statement
          my $usage = "x2y.pl infile infileformat outfile outfileformat\n";
          my $infile = shift or die $usage;
          my $infileformat = shift or die $usage;
          my $outfile = shift or die $usage;
          my $outfileformat = shift or die $usage;

          # create one SeqIO object to read in,and another to write out
          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                       '-format' => $infileformat);
          my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
                                        '-format' => $outfileformat);

          # write each entry in the input file to the output file
          while (my $inseq = $seq_in->next_seq) {
             $seq_out->write_seq($inseq);
          }
          exit;


   You can think of the two variables, $seq_in and $seq_out as being
   rather special types of filehandles which "know" about sequences and
   sequence formats. However, rather than using the <F> operator to read
   files you use the $seqio->next_seq() method and rather than saying
   "print F $line" you say $seqio->write_seq($seq_object).
   [Note] Note

   Bio::SeqIO actually allows you to make use of a rather scary/clever
   part of Perl that can "mimic" filehandles, so that the <F> operator
   returns sequences and the print F operator writes sequences. However,
   for most people, including myself, this looks really really weird and
   leads to probably more confusion.

   Notice that the universal formatter only required a few more lines of
   code than the accession number lister and mean sequence length
   analyzer (mostly to get more command-line args). This is the beauty of
   using the Bioperl system. It doesn't take a lot of code to do some
   really complex things.

   Now, let's play around with the previous code, changing aspects of it
   to exploit the functionality of the SeqIO system. Let's take a stream
   from STDIN, so that we can use other programs to stream data of a
   particular format into the program, and write out a file of a
   particular format. Here we have to make use of two new things: one
   perl-specific, and one SeqIO-specific. Perl allows you to 'GLOB' a
   filehandle by placing a '*' in front of the handle name, making it
   available for use as a variable, or as in this case, as an argument to
   a function. In concert, SeqIO allows you to pass a GLOB'ed filehandle
   to it using the '-fh' parameter in place of the '-file' parameter.
   Here is a program that takes a stream of sequences in a given format
   from STDIN, meaning it could be used like this:

     >cat myseqs.fa | all2y.pl fasta newseqs.gb genbank


   The code for all2y.pl is:

     use Bio::SeqIO;
     # get command-line arguments, or die with a usage statement
     my $usage = "all2y.pl informat outfile outfileformat\n";
     my $informat = shift or die $usage;
     my $outfile = shift or die $usage;
     my $outformat = shift or die $usage;

     # create one SeqIO object to read in, and another to write out
     # *STDIN is a 'globbed' filehandle with the contents of Standard In
     my $seqin = Bio::SeqIO->new(-fh     => \*STDIN,
                                 -format => $informat);
     my $seqout = Bio::SeqIO->new(-file   => ">$outfile",
                                  -format => $outformat);

     # write each entry in the input file to the output file
     while (my $inseq = $seqin->next_seq) {
          $seqout->write_seq($inseq);
     }
     exit;


   Why use files at all? We can pipe STDIN to STDOUT, which could allow
   us to plug this into some other pipeline, something like:

     >cat *.seq | in2out.pl EMBL Genbank | someother program


   The code for in2out.pl could be:

     use Bio::SeqIO;
     # get command-line arguments, or die with a usage statement
     my $usage = "in2out.pl informat outformat\n";
     my $informat = shift or die $usage;
     my $outformat = shift or die $usage;

     # create one SeqIO object to read in, and another to write out
     my $seqin = Bio::SeqIO->new(-fh     => \*STDIN,
                                 -format => $informat);
     my $outseq = Bio::SeqIO->new(-fh     => \*STDOUT,
                                  -format => $outformat);

     # write each entry in the input to the output
     while (my $inseq = $seqin->next_seq) {
            $outseq->write_seq($inseq);
     }
     exit;


6. To and from a string

   A popular question many people have asked is "What if I have a string
   that has a series of sequence records in some format, and I want to
   make it a Seq object?" You might want to do this if you allow users to
   paste in sequence data into a web form, and then do something with
   that sequence data. This can be accomplished using the -fh parameter,
   along with perl's IO::String module that allows you to turn the
   contents of a string into a standard globbed perl handle. This isn't a
   complete program, but gives you the most relevant bits. Assume that
   there is some type of CGI form processing, or some such business, that
   pulls a group of sequences into a variable, and also pulls the format
   definition into another variable.


        use IO::String;
        use Bio::SeqIO;

        # get a string into $string somehow, with its format in
        # $format, say from a web form
        my $stringfh = new IO::String($string);
        my $seqio = Bio::SeqIO-> new(-fh     => $stringfh,
                                     -format => $format);

        while( my $seq = $seqio->next_seq ) {
             # process each seq
        }
        exit;


   Naturally you can also take a sequence object and write it, in some
   format, to a string. The code would look something like this:

       use IO::String;
       use Bio::SeqIO;

       my $str;
       my $io = IO::String->new(\$str);
       my $seqOut = Bio::SeqIO->new(-format => 'swiss',
                                    -fh     => $io );
       $seqOut->write_seq($seq_obj);
       print $str;


7. And more examples...

   The -file parameter in SeqIO can take more than a filename. It can
   also take a string that tells it to 'pipe' something else into it.
   This is of the form '-file' => 'command |'. Notice the vertical bar at
   the end, just before the single quote. This is especially useful when
   you are working with large, gzipped files because you just don't have
   enough disk space to unzip them (e.g. a Genbank full release file),
   but can make fasta files from them. Here is a program that takes a
   gzipped file of a given format and writes out a FASTA file, used like:

      >gzip2fasta.pl gbpri1.seq.gz Genbank gbpri1.fa


   Let the the code begin...


      use Bio::SeqIO;
      # get command-line arguments, or die with a usage statement
      my $usage = "gzip2fasta.pl infile informat outfile\n";
      my $infile = shift or die $usage;
      my $informat = shift or die $usage;
      my $outfile = shift or die $usage;

      # create one SeqIO object to read in, and another to write out
      my $seqin = Bio::SeqIO->new(-file   => "/usr/local/bin/gunzip -c $infile
|",
                                  -format => $informat);

      my $seqout = Bio::SeqIO->new(-file   => ">$outfile",
                                   -format => 'Fasta');

      # write each entry in the input to the output file
      while (my $inseq = $seqin->next_seq) {
            $seqout->write_seq($inseq);
      }
      exit;


   Bioperl also allows a 'pipe - out' to be given as an argument to
   -file. This is of the form '-file' => "| command". This time the
   vertical bar is at the beginning, just after the first quote. Let's
   write a program to take an input file, and write it directly to a
   WashU Blastable Database, without ever writing out a fasta file, like:


     >any2wublastable.pl myfile.gb Genbank mywublastable p


   And the code for any2wublastable.pl is:

     use Bio::SeqIO;

     # get command-line arguments, or die with a usage statement
     my $usage = "any2wublastable.pl infile informat outdbname outdbtype\n";
     my $infile = shift or die $usage;
     my $informat = shift or die $usage;
     my $outdbname = shift or die $usage;
     my $outdbtype = shift or die $usage;

     # create one SeqIO object to read in, and another to write out
     my $seqin = Bio::SeqIO->new(-file   => "<$infile",
                                 -format => $informat);
     my $seqout = Bio::SeqIO->new(-file =>
        "| /usr/local/bin/xdformat -o $outdbname -${outdbtype} -- -",
                                  -format => 'Fasta');

     # write each entry in the input to the output
     while (my $inseq = $seqin->next_seq) {
            $seqout->write_seq($inseq);
     }
     exit;


   Some of the more seasoned perl hackers may have noticed that the 'new'
   method returns a reference, which can be placed into any of the data
   structures used in perl. For instance, let's say you wanted to take a
   genbank file with multiple sequences, and split the human sequences
   out into a human.gb file, and all the rest of the sequences into the
   other.gb file. In this case, we will use a hash to store the two
   handles where 'human' is the key for the human output, and 'other' is
   the key to other, so the usage would be:

      >splitgb.pl inseq.gb


   Here's what splitgb.pl looks like:


      use Bio::SeqIO;

      # get command-line argument, or die with a usage statement
      my $usage = "splitgb.pl infile\n";
      my $infile = shift or die $usage;
      my $inseq = Bio::SeqIO->new('-file' => ">$infile",
                                  '-format' => 'Genbank');

      my %outfiles = (
                      'human' => Bio::SeqIO->new('-file' => '>human.gb',
                                                 '-format' => 'Genbank'),
                      'other' => Bio::SeqIO->new('-file' => '>other.gb',
                                                 '-format' => 'Genbank')
                     );

     while (my $seqin = $inseq->next_seq) {
             # here we make use of the species attribute, which returns a
             # species object, which has a binomial attribute that
             # holds the binomial species name of the source of the sequence
             if ($seqin->species->binomial =~ m/Homo sapiens/) {
                $outfiles{'human'}->write_seq($seqin);
             } else {
                $outfiles{'other'}->write_seq($seqin);
             }
     }
     exit;


   Now, let's use a multidimensional hash to hold a genbank output and a
   fasta output for both splits.


     use Bio::SeqIO;
     # get command-line argument, or die with a usage statement
     my $usage = "splitgb.pl infile\n";
     my $infile = shift or die $usage;
     my $inseq = Bio::SeqIO->new('-file'   => ">$infile",
                                ' -format' => 'Genbank');

     my %outfiles = ('human' => {
                                 'Genbank' => Bio::SeqIO->new('-file' => '>huma
n.gb',
                                                              '-format' => 'Gen
bank'),
                                 'Fasta' => Bio::SeqIO->new('-file' => '>human.
fa',
                                                            '-format' => 'Fasta
')
                                },
                     'other' => {
                                 'Genbank' => Bio::SeqIO->new('-file' => '>othe
r.gb',
                                                              '-format' => 'Gen
bank'),
                                 'Fasta' => Bio::SeqIO->new('-file' => '>other.
fa',
                                                            '-format' => 'Fasta
')
                                }
                    );
                    while (my $seqin = $inseq->next_seq) {
                            if ($seqin->species->binomial =~ m/Homo sapiens/) {
                               $outfiles{'human'}->{'Genbank'}->write_seq($seqi
n);
                               $outfiles{'human'}->{'Fasta'}->write_seq($seqin)
;
                            }
                            else {
                               $outfiles{'other'}->{'Genbank'}->write_seq($seqi
n);
                               $outfiles{'other'}->{'Fasta'}->write_seq($seqin)
;
                            }
                   }
                   exit;


   And finally, you might want to make use of the SeqIO object in a perl
   one-liner. Perl one-liners are perl programs that make use of flags to
   the perl binary allowing you to run programs from the command-line
   without actually needing to write a script into a file. The -e flag
   takes a quoted string, usually single quoted, and attempts to execute
   it as code, while the -M flag takes a module name and tells the
   one-liner to use that module. When using a single quote to enclose the
   string to -e, you also have to make use of perl's string modifier
   'q(string)' to single quote a string without confusing the shell.
   Let's find out how many GSS sequences are in gbpri1.seq.gz. Note that
   we have placed new-line characters in this to make it easier to read,
   but in practice you wouldn't actually hit the return key until you
   were ready to run the program.

      >perl -MBio::SeqIO -e 'my $gss = 0; my $in =  Bio::SeqIO->new(q(-file) =>
 q(/usr/local/bin/gunzip -c gbpri1.seq.gz |),
      q(-format) => q(Genbank)); while (my $seq = $in->next_seq) { $gss++ if ($
seq->keywords =~ m/GSS/);}
      print "There are $gss GSS sequences in gbpri1.seq.gz\n";'


8. Caveats

   Because Bioperl uses a single, generalized data structure to hold
   sequences from all formats, it does impose its own structure on the
   data. For this reason, a little common sense is necessary when using
   the system. For example, a person who takes a flat file pulled
   directly from Genbank, and converts it to another Genbank file with
   Bioperl, will be surprised to find subtle differences between the two
   files - try "diff origfile newfile" to see what we are talking about.
   Just remember when using Bioperl that it was never designed to "round
   trip" your favorite formats. Rather, it was designed to store sequence
   data from many widely different formats into a common object framework
   and make that framework available to other sequence manipulation tasks
   in a programmatic fashion.

9. Error Handling

   If you gave an impossible filename to the first script, it would have
   in fact died with an informative error message. In programming jargon,
   this is called "throwing an exception". An example would look like:

       [localhost:~/src/Bioperl-live] birney% perl t.pl bollocks silly

       ------------- EXCEPTION  -------------
       MSG: Could not open bollocks for reading: No such file or directory
       STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:259
       STACK Bio::SeqIO::_initialize Bio/SeqIO.pm:441
       STACK Bio::SeqIO::genbank::_initialize Bio/SeqIO/genbank.pm:122
       STACK Bio::SeqIO::new Bio/SeqIO.pm:359
       STACK Bio::SeqIO::new Bio/SeqIO.pm:372
       STACK toplevel t.pl:9

       --------------------------------------


   These exceptions are very useful when errors occur because you can see
   the full route, or "stack trace", of where the error occurred and
   right at the end of this is the line number of the script which caused
   the error, which in this case we called t.pl.

   The fact that these sorts of errors are automatically detected and by
   default cause the script to stop is a good thing, but you might want
   to handle these yourself. To do this you need to "catch the exception"
   as follows:

    use strict;
    use Bio::SeqIO;

    my $input_file = shift;
    my $output_file = shift;

    # we have to declare $seq_in and $seq_out before
    # the eval block as we want to use them afterwards

    my $seq_in;
    my $seq_out;

    eval {
     $seq_in  = Bio::SeqIO->new( -format => 'genbank',
                                 -file => $input_file);

    $seq_out = Bio::SeqIO->new( -format => 'fasta',
                                -file => ">$output_file");
    };
    if( $@ ) { # an error occurred
      print "Was not able to open files, sorry!\n";
      print "Full error is\n\n$@\n";
      exit(-1);
    }
    my $seq;
    while( $seq = $seq_in->next_seq() ) {
       $seq_out->write_seq($seq);
    }


   The use of eval { ... } accompanied by testing the value of the $@
   variable (which is set on an error) is a generic Perl approach, and
   will work with all errors generated in a Perl program, not just the
   ones in Bioperl. Notice that we have to declare $seq_in and $seq_out
   using "my" before the eval block - a common gotcha is to wrap a eval
   block around some "my" variables inside the block - and now "my"
   localizes those variables only to that block. If you "use strict" this
   error will be caught. And, of course, you are going to "use strict"
   right?

   BioPerl DocBook

References

   1. mailto:birney-at-ebi.ac.uk
   2. mailto:dlondon-at-ebi.ac.uk
   3. mailto:brian-at-cognia.com
   4. file://localhost/cygdrive/c/docbook/deploy/SeqIO/SeqIO.html#basics
   5. file://localhost/cygdrive/c/docbook/deploy/SeqIO/SeqIO.html#overview
   6. file://localhost/cygdrive/c/docbook/deploy/SeqIO/SeqIO.html#background
   7. file://localhost/cygdrive/c/docbook/deploy/SeqIO/SeqIO.html#formats
   8. file://localhost/cygdrive/c/docbook/deploy/SeqIO/SeqIO.html#working
   9. file://localhost/cygdrive/c/docbook/deploy/SeqIO/SeqIO.html#string
  10. file://localhost/cygdrive/c/docbook/deploy/SeqIO/SeqIO.html#and_more
  11. file://localhost/cygdrive/c/docbook/deploy/SeqIO/SeqIO.html#caveats
  12. file://localhost/cygdrive/c/docbook/deploy/SeqIO/SeqIO.html#errors
  13. http://bioperl.org/HOWTOs/Feature-Annotation/index.html
