

1- code API needed and executable
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  bioperl-live (bioperl-1-2-0?)
  ensembl
  ensembl-compara
  ensembl-hive
  ensembl-pipeline
  ensebml-analysis

  executables
  ~~~~~~~~~~~
  blastz
      using /usr/local/ensembl/bin/blastz

  or
  blat
     using /usr/local/ensembl/bin/blat-32

1.2 Code checkout

    bioperl code

    cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl login
    when prompted, the password is 'cvs'
    cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl co -r branch-1-2 -d bioperl-1-2 bioperl-live

    core ensembl code

    cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl login
    When prompted the password is 'CVSUSER'. 
    cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl co ensembl

    ensembl-compara code (for compara API and some RunnableDBs or Hive::Process)
    cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl co ensembl-compara
    
    ensembl-analysis code (for Runnables/RunnableDBs)
    cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl co ensembl-analysis

    ensembl-hive code (Bio::EnsEMBL::Hive pipeline infrastructure)
    cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl co ensembl-hive

    ensembl-pipeline code (Bio::EnsEMBL:Pipeline pipeline infrastructure, this pipeline has still some
    dependancies with this code. Hopefully this should not be the case in the near future)
    cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl co ensembl-pipeline

in tcsh
    setenv BASEDIR   /some/path/to/modules
    setenv PERL5LIB  ${BASEDIR}/ensembl/modules:${BASEDIR}/ensembl-pipeline/modules:${BASEDIR}/bioperl-live:${BASEDIR}/ensembl-compara:${BASEDIR}/ensembl-hive:${BASEDIR}/ensembl-analysis
    setenv PATH $PATH:${BASEDIR}/ensembl-compara/script/pipeline:${BASEDIR}/ensembl-hive/scripts

in bash
    BASEDIR=/some/path/to/modules
    PERL5LIB=${BASEDIR}/ensembl/modules:${BASEDIR}/ensembl-pipeline/modules:${BASEDIR}/bioperl-live:${BASEDIR}/ensembl-compara/modules:${BASEDIR}/ensembl-hive/modules:${BASEDIR}/ensembl-analysis/modules
    PATH=$PATH:${BASEDIR}/ensembl-compara/scripts/pipeline:${BASEDIR}/ensembl-hive/scripts

Copy General.pm.example to General.pm
cp ${BASEDIR}/ensembl-analysis/modules/Bio/EnsEMBL/Analysis/Config/General.pm.example ${BASEDIR}/ensembl-analysis/modules/Bio/EnsEMBL/Analysis/Config/General.pm

2- Create a hive/compara database
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pick a mysql instance and create a database

mysql -h ia64g -P3306 -uensadmin -pxxxx -e "create database abel_compara_human_cow_blastz_27"

cd ~/src/ensembl_main/ensembl-compara/sql
mysql -h ia64g -P3306 -uensadmin -pxxxx abel_compara_human_cow_blastz_27 < table.sql
mysql -h ia64g -P3306 -uensadmin -pxxxx abel_compara_human_cow_blastz_27 < pipeline-tables.sql

cd ~/src/ensembl_main/ensembl-hive/sql
mysql -h ia64g -P3306 -uensadmin -pxxxx abel_compara_human_cow_blastz_27 < tables.sql




3- Choose a working directory with some disk space
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The pair aligner pipeline tries to minize the amount of output data for workers. But if 
debug options is on this can take some space. So be careful.

mkdir -p /acari/work7a/abel/hive/abel_compara_human_cow_blastz_27/workers

This directory can be set in 'hive_output_dir' variable in the compara-hive
configuration file. Unless you know what you are doing we recommend to set to an empty string
'hive_output_dir' => ""
Or not set it at all. In both case, all STDOUT/STDERR goes to /dev/null.

# IMPORTANT: The hive system can generate an awful lot of log outputs that are dumped in
# the hive_output_dir. When a pipeline runs fine, these are not needed and can take a lot of
# disk space as well as generate a large number of files. If you don't want log outputs (RECOMMENDED),
# then just don't specify any hive_output_dir (delete or comment the line or set to "" if you don't want
# any STDOUT/STDERR files)

4- Copy and modify your compara-hive config file
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

cd /acari/work7a/abel/hive/abel_compara_human_cow_blastz_27

cp ~/src/ensembl_main/ensembl-compara/scripts/pipeline/compara-hive-pairaligner.conf.example compara_human_cow_27.conf
<editor> compara_human_cow_27.conf

you may need to change the database names, port, dbnames, and the
paths to the 'hive_output_dir' to
/acari/work7a/abel/hive/abel_compara_human_cow_blastz_27/workers

Also, the genome with the best assembly should be used as the target.

5- Copy data from the master. 
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

cd ~/src/ensembl_main/ensembl-compara/scripts/pipeline  

populate_new_database.pl --master compara_master --species "Homo sapiens" --species "Bos taurus" --new abel_compara_human_cow_blastz_27 

6- Run the configure scripts
   ~~~~~~~~~~~~~~~~~~~~~~~~~
The following scripts are in ensembl-compara/scripts/pipeline (should be in your PATH)
The order of execution is important

comparaLoadGenomes.pl -conf compara_human_cow_27.conf
loadPairAlignerSystem.pl -conf compara_human_cow_27.conf

The comparaLoadGenomes script use the information in the conf file to connect to
the core databases, queries for things like taxon_id, assembly, gene_build, and names
to create entries in the genome_db table.  It also sets the genome_db.locator column to
allow the system to know where the respective core databases are located. This script 
will also 'seed' the pipeline/hive system by creating the first jobs in the analysis_job
table for the analysis 'SubmitGenome'.

The loadPairAlignerSystem script creates the analysis entries for the processing
system, and creates both the dataflow rule and the analysis control rules.
It also initializes the analysis_stats row for each analysis.  These row hold
information like batch_size, hive_capacity, and run-time stats that the Hive's
Queen will update.

These scripts may give you warnings if the output directories are not available or if
it's unable to connect to core databases.

At this point the system is ready to run


7- Run the beekeeper
   ~~~~~~~~~~~~~~~~~
The following scripts are in ensembl-hive/scripts (should be in your PATH)
beekeeper.pl -url mysql://ensadmin:xxxx@ia64g:3306/abel_compara_human_cow_blastz_27 -loop

where xxxx is the password for write access to the database

for more details on controling/monitoring the hive system see 
beekeeper -help



