# -*-mode:org;  org-todo-keywords: '((sequence "todo" "done"))-*-
# Copyright 2009, 2010 Joel James Adamson <adamsonj@email.unc.edu>

# $Id$

# Joel J. Adamson	-- http://www.unc.edu/~adamsonj
# University of North Carolina at Chapel Hill
# CB #3280, Coker Hall
# Chapel Hill, NC 27599-3280
# <adamsonj@email.unc.edu>

# This file is part of haploid

# haploid is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at your
# option) any later version.

# haploid is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.

# You should have received a copy of the GNU General Public License
# along with haploid.  If not, see <http://www.gnu.org/licenses/>.
* This file is obsolete
  We now use a task manager / ticket system on Savannah:

  https://savannah.nongnu.org/task/?group=haploid
* In the near future
** todo Style
*** Take hints from _Practice of Programming_, Chapter 1
*** done Replace macros with inline functions?
    e.g. isset () instead of B_IS_SET
*** Function names
    - use "package" names with a tag at the beginning (this may require
      changing some file names to shorter names)
    - use library name
    - use macros to define names, so that internally the functions
      will have the same names as defined in the file:
      #define haploid_isset bits_isset
      _Bool haploid_isset (...)
** Archive							    :ARCHIVE:
*** done Stop function
    :PROPERTIES:
    :ARCHIVE_TIME: 2009-12-10 Thu 14:04
    :END:
    Write a generalized function that will execute a non-local exit
    when two vectors have converged.  That will avoid having to code
    the same damn thing every time.  It should also take an argument of
    a tolerance.
**** <2009-12-10 Thu>
     The test case now works, and returns success, but is clearly
     semantically wrong.  The iteration only runs once.  The sense of
     logic may be reversed in sim_stop_ck ().  Run it in gdb and
     determine how the loop progresses.
*** done GNU99 std
    :PROPERTIES:
    :ARCHIVE_TIME: 2009-12-15 Tue 16:41
    :END:
    this should go in Makefile.am; look up how to do it
*** done Fix up set_rec_table
    :PROPERTIES:
    :ARCHIVE_TIME: 2009-12-15 Tue 16:41
    :END:
**** Pointer arithmetic
     This function should be internal and we can use indexed pointers
     instead of 3d array.

     This algorithm is set, however we need to find out how exactly the
     elements of unsigned int * zyg are assigned.  zygotes[] is
     initialized at size 4, however zygote_genotypes () only assigns
     two positions in the array (through * zyg).  Why not assign all
     four?  There must be some shortcut.  This seems like a piece of
     heavily K&R style coding, courtesy of MK.
*** done Use conditional compilation
    CLOSED: [2009-12-16 Wed 13:39]
    :PROPERTIES:
    :ARCHIVE_TIME: 2009-12-16 Wed 13:39
    :END:
    Use #if and so on for two-sex numbers of genotypes and so on: in
    recombination () you could set #if GENO != FGENO to define length
    of arrays; i.e. user should define numbers of genotypes and
    alleles, loci, etc.

    <2009-12-16 Wed>

    Conditional compilation just doesn't make sense.  Males and females
    need to always have the same numbers of loci for recombination to
    work (unless we're dealing with non-autosomal inheritance, which we
    won't for a while).

    The better solution is to calculate GENO from some other quantity,
    rather than defining it as a macro.  This way we can have mutations
    crop up and change the number of genotypes on the fly (or bird, as
    the case may be).

    For now the solution is that the user (application programmer)
    needs to define NLOCI and GENO
*** done Drop non-local exit from sim_stop_ck ()
    :PROPERTIES:
    :ARCHIVE_TIME: 2009-12-29 Tue 21:35
    :END:
    - totally doesn't make sense in a program with multiple starting
      values
*** Test case for recombination
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-01-03 Sun 21:06
    :END:
**** done TLTA
***** done Change to reporting allele frequencies and D instead of genotype frequencies
***** Basic design
      1. Generate a fitness function for the four genotypes.  
      2. Have them mate randomly (define a mating table full of ones)
         to generate next generation numbers (relative frequencies)
      3. get new frequencies from new numbers divided by the sum of all
         new numbers

***** Life cycle
      zygote => selection => adult => mating => recombination => zygote
**** done Fix recombination algorithm
     - it's not giving me the right values under [[TLTA]]
     - something about the diagonal entries ...
     - I'm getting values over 1
*** done Qualifiers
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-01-03 Sun 21:06
    :END:
    - mask_lim et al can be const
    - we may wan tto use restrict for rec_table
*** done Promote functions to library
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-01-03 Sun 21:33
    :END:
    - take functions from tlta that are generalizable
    - create some new header files
    - wait on bithacks until it can be extended to include more useful
      operations
*** done Library functions need runtime array lengths
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-01-04 Mon 15:52
    :END:
    - the only reason tlta worked is because haplibpriv sets GENO and
      NLOCI to the right values
    - C99 (GNU99?) allows array declarations in formal parameters with
      other formal parameters as dimensions; this may be the way out.
    - All library functions will therefore need a "len" argument, a la
      gen_mean ()
    - other options:
      + leave array dimension unspecified
      + use pointer notation in formal parameters and array notation in function
      + we still need the dimensions for iteration purposes
*** done Manual
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-01-29 Fri 11:20
    :END:
    - As soon as TLTA works, we need a manual to keep all this straight
*** todo Return values
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-01-29 Fri 11:21
    :END:
    - some functions return integers but they do no error-checking
      (e.g. set_rec_table)
    - set_rec_table does error-checking, but is still error-prone;
      there is no guarantee that over-reaching r[] will return NaN for
      zygote_prob; is there just some way to check the size of the array?
*** done Debugging set_rec_table
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-02-09 Tue 13:24
    :END:
    - print rec_tables of varying sizes and recombination coefficients
    - solve known examples beforehand to check them
    - print them mating-by-mating (i.e. print the offspring
      probabilities of each offspring genotype for a particular mating)
*** done Use error instead of perror
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-02-24 Wed 14:05
    :END:
*** done Abstract bit-twiddling
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-02-24 Wed 14:18
    :END:
**** zygote_genotypes ()
     zygote_genotypes () and other functions in recomb.c could be
     abstracted to higher-level programming
**** Bit-twiddling plan
    - formulate a general plan for how to deal with bit-twiddling;
      this requires a policy
      1. Find all the places with bitwise operators (use grep!) and
         make sure we can abstract them to a small number of
         procedures
      2. Here's the list:
$ grep -nH -e ">>" -e "<<" -e "[&|]" src/*.c
    - three fundamental operations needed:
      1. Test if a particular bit is set (create a mask with shifting,
         then and it with the integer in question, then test if the
         result is zero)
      2. isolate an integer
	 + look this up: there should be plenty of examples of it
	 + Guile has this (meaning it has been done before, so there
           should be discussion about it)
	 + idea:
	   #+CODE:
	   int
	   int_extract (int start, int end, int x)
	   {
	     unsigned int len = end - start;
	     x >>= start; /* okay to modify local x */
	     unsigned int maxshift = UINT_MAX >> len;
	     return x & maxshift; /* exclude all the values above len
                                     positions */
	   }
	     
      3. Population count
	 + gcc ___builtin_popcount (x) does this 
	   * test with #if __GNUC__ ___builtin_popcount (x) #elif
             <some other algorithm>)
	   * gcc should optimize this well
	   * provide a wrapper with the conditional compilation, such
             that user programs (and the library, e.g. recomb.c) can
             call the procedure without conditional compilation
	 + there are many portable algorithms
	 + known as "Hamming weight," "population count" and other names
**** Get rid of bithacks.h
*** done Sparse matrix procedures [100%]
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-03-25 Thu 11:52
    :END:
    - [X] sparse matrix multiplication
    - [X] sparse matrix API (make a layer over sparse matrices so the
	  user doesn't need to manipulate them)
	  + [X] create a data type (structure) that will hold both the
            mating table and the recombination table
	  + [X] create a typedef for sparse matrices so that they appear to
            be recombination tables at all times
    - [X] re-write set_rec_table () and recombination ()
	  + Totally re-done with original code
*** done Parallelization
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-03-25 Thu 11:52
    :END:
    - at least set_rec_table and recombination () need paralellization
    - may need to use delay directive for facultative copying
      + Update: not using facultative copying
      + much more important to parallelize recombination ()
*** Testing [100%]
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-03-28 Sun 09:54
    :END:
    - [X] test recombination by printing a recombination table and
          checking the entries
      + [X] small table (tlta)
      + [X] large table (e.g. 64x64x64): you will need to write a
	    program to do this
    - [X] redo TLTA with new API
*** done recombination ()
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-03-29 Mon 10:27
    :END:
    - [ ] replace recombination for loops with sparse matrix procedure
    - [ ] parallelize
* In the future
** todo Make sim_stop_ck () accept a function
** todo Mating step wrapper code
   recombination () is so simple it should be inlined, however since
   it is now external, I cannot inline it.  Then there will be only
   one library function for recomb.c
** Incorporate graphing utilities
** Archive							    :ARCHIVE:
*** todo Bit-twiddling library
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-02-24 Wed 14:24
    :END:
    - I should study bit-twiddling and add a library file for it
    - bithack.h does some good, but does not include some things that
      could replace get_bit and so on
    - this sort of programming is essential to diallelic models; I
      should master it
* Things to think about
** Archive							    :ARCHIVE:
*** The user should define
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-02-24 Wed 14:25
    :END:
    1. transmission probabilities
    2. selection scheme 
    3. non-random mating (mating table)
    4. recombination scheme
    5. number of loci
    6. alleles
    7. mutation rates
    8. I/O

    Library routines should
    1. Help users define the above quantities
    2. take these as inputs and produce routine simulations

    (1) are features; (2) are essential.  
*** Finalize algorithm
    :PROPERTIES:
    :ARCHIVE_TIME: 2010-02-24 Wed 14:25
    :END:

    Take Maria's suggestions about how to do this; she knows what she
    is talking about!  The algorithm needs to take phenogenotypes and
    1. perform selection
    2. perform non-random mating
    3. perform recombination 
    4. advance any "life-stages" to the next "life-stage" (which may be
       death).

    In other words, this is going to need to model a huge dynamical
    system flexibly (defined by the user).  Learn how to take functions
    as arguments and call them.
