Installation instructions for LAM 6.5.9
=========================================

This file contains the installation instructions for LAM/MPI version
6.5.9.  There are also some tips for writing/developing/running
parallel programs, especially in parallel environments that are
clusters of workstations.  If you have problems configuring/compiling
LAM, see the "Troubleshooting" section, below.

Here's a brief table of contents (### indicates updates for this
version):

 ### * For the impatient
 ### * Unpacking the distribution
 ### * Architecture-specific notes
     * Configuration
 ###   - ROMIO issues
 ###   - MPI 2 C++ issues
 ###   - VPATH builds
 ###   - ./configure switches
 ###   - 64 bit LAM
 ### * Building LAM
 ### * Building LAM, ROMIO, and MPI 2 C++ examples
 ### * Boot schema
     * Using LAM
       - Typical usage
 ###   - The LAMHOME and TROLLIUSHOME environment variables
       - Starting LAM
       - Common filesystems
       - Using LAM with AFS
       - Using LAM with ssh
 ### * Troubleshooting
 ###   - Problems with building LAM
       - Problems with running LAM and/or user programs
 ###   - Insufficient shared resources
     * Clearing disk space
     * Tuning LAM
       - Short/long protocol
 ###   - Shortcircuit send/receive 
       - TCP transport
       - Usysv and sysv transports
       - Use of the global pool
       - Synchronization
       - Usysv transport spin-locks
       - Sysv transport semaphores


For the impatient
-----------------

If you don't want to read the rest of the instructions, the following
should do the trick for most situations:

     % gunzip -c lam-6.5.9.tar.gz | tar xf -
     % cd lam-6.5.9
     % ./configure --prefix=/path/to/install/in
     [...lots of output...]
     % make
     [...lots of output...]
     % make install
     [...lots of output...]

     # Ensure that $prefix/bin is in in your $path so that LAM's
     # newly-created "mpicc" can be found

     % make examples   # This step is optional
     [...lots of output...]

If you do not specify a prefix, LAM will first look for "lamclean" in
your path.  If lamclean is found, it will use the parent of the
directory where lamclean is located as the prefix.  Otherwise,
/usr/local is used (like most GNU software).

Now go read the RELEASE_NOTES file; it contains all the information
about the new features of this release of LAM/MPI.

Common causes of failure:

     - No C++ compiler installed; use --without-mpi2cpp configure option
     - C++ compiler does not support required C++ features; use
       --without-mpi2cpp configure option
     - No Fortran compiler installed; user --without-fc configure option
     - "mpicc" cannot be found, or other kinds of failures when
       compiling the examples; ensure that the
       newly-compiled-and-installed "mpicc" is found *first* in your
       path (i.e., if mpicc appears in your path multiple times, the
       new one must be first)

Unpacking the distribution
--------------------------

The LAM distribution is packaged as a compressed tape archive,
lam-6.5.9.tar.Z, lam-6.5.9.tar.gz, or lam-6.5.9.tar.bz2.  It is
available from the main LAM web site: http://www.lam-mpi.org/

Uncompress the archive and extract the sources. 

     % gunzip -c lam-6.5.9.tar.gz | tar xf -

or 

     % uncompress -c lam-6.5.9.tar.Z | tar xf -

or 

     % bunzip2 -c lam-6.5.9.tar.bz2 | tar xf -


Architecture-specific notes
---------------------------

LAM/MPI will build on just about any POSIX system.  There are,
however, a few restrictions:

--- Microsoft Windows

Microsoft Windows is not a POSIX platform.  LAM/MPI currently will not
build in a Windows environment.

--- AIX

It appears that GNU libtool does not presently support building shared
libraries on AIX.  This has been tested on AIX 4.3.3; it is not known
if GNU libtool builds shared libraries on other versions of AIX.

Additionally, in some cases, GNU libtool apparently does not function
completely properly when using the "xlc" compiler.  Use "cc", instead
(they are both the same compiler anyway).

Finally, there have been repeatable problems with AIX's "make" when
building ROMIO.  This does not appear to be ROMIO's fault -- it
appears to be a bug in AIX's "make".  The LAM Team suggests that you
use GNU "make" (ftp://ftp.gnu.org/gnu/make/) when building on AIX
platforms to avoid these problems.

--- Various BSD systems

The version of "make" that is distributed on some BSD systems (e.g.,
FreeBSD) requires the use of the "-i" parameter to some of LAM's make
targets.  For example:

     make -i clean

--- HP-UX

It appears that the default C++ compiler on HP-UX (CC) is a pre-ANSI
standard C++ compiler.  As such, it will not build the C++ bindings
package.  The C++ compiler "aCC" should be used to build
the C++ bindings package.  The C++ compiler can be specified by
specifying "--with-cxx=aCC" as an option to configure.

-- Mac OS X

Earlier versions of OS X (prior to 10.2 "Jaguar") do not support
System V semaphores.  On these systems, the sysv and usysv rpis are
not available.

By default, OS X uses the HFS+ filesystem, which is not case
sensitive.  Historically, the wrapper compilers provided by LAM/MPI
have been mpicc, mpiCC, and mpif77.  Unfortunately, mpicc and mpiCC
are the same file, so the C++ compiler can now be invoked by the
command mpic++.

ROMIO is not smart enough to properly configure itself on OS X.  There
have been reports that passing the argument

    --with-romio-flags="-cflags=-DNO_AIO"

to configure will allow ROMIO to properly compile and function.  This
has not been well tested by the LAM team.


Configuration
-------------

LAM uses a GNU configure script to perform site and architecture
specific configuration.

Change directory to the top level LAM directory (lam-6.5.9) and run the
configure script.

     % ./configure {options}

or 

     % sh ./configure {options}

By default the configure script sets the LAM install directory to the
parent of where lamclean is found (if it is in your path), or
/usr/local if lamclean is not in your path.  This can be overridden
with the --prefix option (see below).


ROMIO issues
------------

Note that the ROMIO package does not currently support many GNU-like
configure switches.  In particular, attempting to use any of the
directory-specifying options (other than --prefix) will not work as
expected with ROMIO.  ROMIO installs everything under
$(DESTDIR)/$prefix.  Hence, if you attempt to use switches such as
--libdir, --bindir, etc. to LAM's configure, all of the LAM (and the
C++ bindings) will install as expected, but ROMIO will still install
itself under $prefix.

The ROMIO authors have been notified of this issue.


MPI 2 C++ issues
----------------

--- C++ Exceptions

The default is to build LAM/MPI with the C++ bindings, but without C++
exception support.

Enabling C++ exceptions typically entails a slight degradation of
run-time performance because of extra bootstrapping required for every
function call (particularly with gcc/g++).  As such, they are disabled
by default, and the MPI::ERRORS_THROW_EXCEPTIONS error handler will
only print out error messages.  If full exception handling
capabilities are desired, LAM must be configured with the
"--with-exceptions" flag.  It should be noted that some C++ (and C and
Fortran) compilers need additional command line flags to properly
enable exception handling.

For example, with gcc/g++ 2.95.2 and later, gcc, g77, and g++ all
require the command line flag "-fexceptions".  gcc and gf77 require
"-fexceptions" so that they can pass C++ exceptions through C and
Fortran functions properly.  As such, *all* of LAM/MPI must be
compiled with the appropriate compiler options, not just the C++
bindings.  Using MPI::ERRORS_THROW_EXCEPTIONS without having compiled
LAM with proper exception support will cause undefined behavior (read:
core dumps and other Bad Things).

If building with IMPI or the C++ bindings, LAM's configure script will
automatically guess the necessary compiler exception support command
line flags for the gcc/g++ and KCC compilers.  That is, if a user
selects to build the MPI 2 C++ bindings and/or the IMPI extensions,
and also selects to build exception support, and g++ or KCC is
selected as the C++ compiler, the appropriate exceptions flags will
automatically be used.

Users with other compilers that require command line flags for
exception support should use the "--with-exflags=FLAGS" command line
switch to configure.

Note that this also applies even if you do not build the C++ bindings
-- if LAM is to call C++ functions that may throw exceptions
(e.g,. from an MPI error handler or other callback function), you need
to build LAM with the appropriate exceptions compiler flags.


--- Mixing Vendor Compilers

A single vendor product line should be used to compile all of the C,
Fortran, and C++ code.  That is, if gcc is used to compile LAM, g++
should be used to compile the C++ bindings, and gcc/g++/g77 should be
used to compile any user programs.  Mixing multiple vendors' compilers
between different components of LAM/MPI and/or to compile user MPI
programs, particularly when using the C++ MPI bindings, is almost
guaranteed not to work.

C++ compilers are not link-compatible -- compiling the C++ bindings
with one C++ compiler and compiling a user program that uses the MPI
C++ bindings will almost certainly produce linker errors.  

Indeed, if exception support is enabled in the C++ bindings, it will
only work if the C and/or Fortran code knows how to pass C++
exceptions through their code.  This will only happen properly when
the same compiler (or a single vendor's compiler product line, such as
gcc, g77, and g++) is used to compile *all* components -- LAM/MPI, the
C++ bindings, and the user program.  Using multiple vendor compilers
with C++ exceptions will almost certainly not work (read: core dumps
and other Bad Things).

The one possible exception to this rule (pardon the pun) is the KCC
compiler.  Since KCC turns C++ code to C code and then gives it to the
back end "native" C compiler, KCC may work properly with the native C
and Fortran compilers.


VPATH builds
------------

Alternatively, LAM supports the "VPATH" building mechanism.  If
LAM/MPI is to be installed in multiple environments that require
different options to configure, or require different compilers (such
as compiling for multiple architectures/operating systems), the
following form can be used to configure LAM:

     % cd /some/temp/directory
     % LAMTOP/configure {options}

where LAMTOP is the directory where the LAM/MPI distribution tarball
was expanded.  This form will build the LAM executables and libraries
under /some/temp/directory and will not produce any files in the
LAMTOP tree.  It allows multiple, concurrent builds of LAM/MPI from
the same source tree.  

Note that you must have a VPATH-enabled "make" in order to use this
form.  The GNU "make" (ftp://ftp.gnu.org/gnu/make/) supports VPATH
builds, for example, but the Solaris Workshop 5.0 "make" does not.
Parts of LAM/MPI may compile correctly in a VPATH build without a
VPATH-enabled compiler, but ROMIO will not.


./configure switches
--------------------

The configure script will create several configuration files,
including share/include/lam_config.h.  You may wish to inspect this
file for a sanity check, but ./configure usually guesses correctly.

There are many options available from the configure script.  You can
use the command "./configure --help" to list them all.  An explanation
of each follows (shown here in alphabetical order):

--disable-static

     Do not build static libraries.  This flag is only meaningful when
--enable-shared is specified; if this flag is specified without
--enable-shared, it is ignored, and static libraries are created.

--enable-echo

     Will echo all of the commands that configure executes.  This is
usually for debugging purposes only, and is not recommended for end
users.

--enable-shared

     Build shared libraries.  Note that this option is incompatible
with --with-romio (which is the default) and --with-mpi2cpp (which is
also the default) because (among other reasons) ROMIO expects to find
libmpi.a, not libmpi.so.

     Also note that enabling building shared libraries does *not*
disable building the static libraries.  Specifying --enable-shared
without --disable-static will result in a build taking twice as long,
and installing both the static and shared libraries.

     Finally, note that neither ROMIO nor the MPI 2 C++ bindings do
not currently support shared libraries.  They will always be built as
static libraries.

--prefix=PREFIX

     Sets the installation location for the LAM binaries, libraries,
etc., to the directory PREFIX.  PREFIX must be specified as an
absolute directory name.

--with-cc=CC

     Use the C compiler CC.  The C compiler can also be selected by
setting the "CC" environment variable before running configure.  This
compiler will be used both to compile LAM, and as the default compiler
for the hcc(1) and mpicc(1) wrapper compilers.

--with-cflags=CFLAGS

     Use the C compiler flags CFLAGS.  The flags passed to the C
compiler can also be selected by setting the "CFLAGS" environment
variable before running configure.  These flags are used to compile
LAM, ROMIO, and some example programs that come with LAM.  If CFLAGS
are not specified, ./configure will pick optimization flags to use.

     These flags are *not* used as default flags in any of the wrapper
compilers.

--with-cxx=CXX

     Use the C++ compiler CXX.  The C++ compiler can also be selected
by setting the "CXX" environment variable before running configure.
This compiler will be used to compile the MPI 2 C++ bindings, IMPI
support, and will be used as the default compiler for the hcp(1) and
mpiCC(1) wrapper compilers.

--with-cxxflags=CXXFLAGS

     Use the C++ compiler flags CXXFLAGS.  The flags passed to the C++
compiler can also be selected by setting the "CXXFLAGS" environment
variable before running configure.  These flags will be used when
compiling the MPI 2 C++ bindings, IMPI support, as well as some
example programs that come with LAM.  If CXXFLAGS are not specified,
./configure will pick optimization flags to use.

     These flags are *not* used as default flags in any of the wrapper
compilers.

--with-cxxldflags=CXXLDFLAGS

     Use the C++ linker flags CXXLDFLAGS.  These flags will be used
when compiling the MPI 2 C++ bindings, IMPI support, as well as some
example programs that come with LAM.  If CXXFLAGS are not
specified. ./configure will pick optimization flags to use.

     These flags are *not* used as default flags in any of the wrapper
compilers.

--with-exceptions

      Used to enable exception handling support in the C++ bindings
for MPI.  Exception handling support (i.e., the
MPI::ERRORS_THROW_EXCEPTIONS error handler) is disabled by default.
See the section "MPI 2 C++ Issues", above.

--with-exflags=FLAGS

      Used to specify any command line arguments that are necessary
for the C, C++, and Fortran compilers to enable C++ exception support.
This switch is ignored unless --with-exceptions is also specified.

      This switch is unnecessary for gcc/g77/g++ version 2.95 and
above -- "-fexceptions" will automatically be used (when building
--with-exceptions).  Additionally, this switch is unnecessary if the
KCC compiler is used -- "-x" is automatically used.

      See the section entitled "MPI 2 C++ Issues", above.

--with-fc=FC

     Use the Fortran compiler FC.  Specify FC=no (or --without-fc) to
disable Fortran support if you do not have a Fortran compiler or do
not require such support.  This compiler will be used both to compile
LAM, and as the default compiler for the hf77(1) and mpif77(1) wrapper
compilers.

--with-fflags=FFLAGS

     Use the Fortran compiler flags FFLAGS when compiling LAM.  The
flags passed to the Fortran compiler can also be selected by setting
the "FFLAGS" environment variable before running configure.  These
flags will be used only when compiling some example programs that come
with LAM.  If FFLAGS are not specified, ./configure will pick
optimization flags to use.

     These flags are *not* used as default flags in any of the wrapper
compilers.

--with-impi

     Use this switch to enable the IMPI extensions.  The IMPI
extensions are still consdered experimental, and are disabled by
default.

--with-lamd-ack=SEC

     Number of seconds until an ACK is resent between LAM daemons.
You probably shouldn't need to change this; the default is one a
second.

--with-lamd-hb=SEC

    Number of seconds between heartbeat messages in the LAM daemon
(only applicable when running in fault tolerant mode).  You probably
shouldn't need to change this; the default is 120 seconds.

--with-lamd-boot=SEC

     Set the default number of seconds to wait before a process started
on a remote node is considered to have failed (e.g., during lamboot).
You probably shouldn't need to change this; the default is 60 seconds.

--with-ldflags=LDFLAGS

     Use the LD linker flags LDFLAGS.  If this flag is not set on the
./configure command line, the value for CFLAGS is used.  These flags
are used to link LAM executables and all example programs that come
with LAM.  If LDFLAGS (and CFLAGS) are not specified, ./configure will
pick optimization flags to use.

     These flags are *not* used as default flags in any of the wrapper
compilers.

--without-mpi2cpp

     Build LAM without the MPI-2 C++ bindings (see chapter 10 of the
MPI-2 standard); the default is to build them.  The C++ bindings
require some advanced features of the C++ compiler.  While most modern
C++ compilers now support all the required features, you may encounter
problems on some platforms.  Consult the mpi2c++/README file for more
information.

--without-profiling

     Build LAM/MPI without the MPI profiling layer.  The default is to
build this layer, since ROMIO uses it.  See the --without-romio option
for more details.

--with-pthread-lock

     Use a process shared pthread mutex to lock access to the shared
memory pool rather than the default SYSV semaphore.  This option is
only valid with the "usysv" RPI, and on systems which support process
shared pthread mutexes.

--with-purify

     Causes LAM to zero out all data structures before using them.
This option is not necessary to make LAM function correctly (LAM
already zeros out relevant structure members when necessary), but it
is very helpful when running MPI programs through memory checking
debuggers, such as purify and the Solaris Workshop bcheck program.
See the "Zeroing out LAM buffers before use" section of the
RELEASE_NOTES file for more information.  The default is to not enable
this option.

--without-romio

     Build LAM without ROMIO support (ROMIO provides the MPI-2 I/O
support, see chapter 9 of the MPI-2 standard); the default is to build
*with* ROMIO support.  ROMIO is known to only work on certain systems.
Consult the romio/README file for more information.  Note that this
option is incompatible with --with-shared, because (among other
reasons) ROMIO expects to find libmpi.a, not libmpi.so.

     Note also that building ROMIO implies building the profiling
layer.  ROMIO makes extensive use of the MPI profiling layer; that is
you cannot select --without-profiling without also specifying
--without-romio. 

--with-romio-flags=FLAGS

     Pass FLAGS to ROMIO's configure script when it is invoked during
the build process.  This switch is to effect specific behavior in
ROMIO, such as building for a non-default file system (e.g., PVNFS).
Note that LAM already sends the following switches to ROMIO's
configure script -- the --with-romio-flags switch should not be used
to override them:

     --prefix
     -mpi
     -mpiincdir
     -cc
     -fc
     -debug (if -g is specified in CFLAGS)
     -cflags
     -fflags
     -nof77 (if --without-fc is selected in LAM)
     -make
     -mpilib

--with-rpi=RPI

     Build with request progression interface (RPI) transport layer
RPI [RPI=tcp]. RPI must one of: tcp, sysv, or usysv.  If this option
is not specified, the RPI transport layer defaults to tcp.  Please
refer to the RELEASE_NOTES file for descriptions of the RPI transport
layers.

--with-rsh=RSH

     Use RSH as the remote shell command. For example if you want to
use the secure shell ssh then specify --with-rsh="ssh -x" (note that
the "-x" is necessary to prevent the ssh 1.x series of clients from
sending its standard banner information to standard error, which will
cause recon/lamboot/etc. to fail).  This shell command will be used to
launch commands on remote nodes from binaries such as lamboot, wipe,
etc.  The command can be one or more shell words, such as a command
and multiple command line switches.

This value can be overridden at recon/lamboot/etc. run time with the
LAMRSH environment variable.  See the RELEASE_NOTES file for more
details.

--with-select-yield

     Force the use of select() to yield the processor. 

--with-shm-maxalloc=BYTES

     Use BYTES as the size of the maximum allocation from the shared
memory pool.  If no value is specified, configure will set the size
according to the value of shm-poolsize (below).  See "Usysv and Sysv
transports", below.

--with-shm-poolsize=BYTES

     Use BYTES as the size of the shared memory pool.  If no size is
specified, configure will determine a suitably large size to use.  See
"Ususv and Sysv transports", below.

--with-shm-short=BYTES

     Use BYTES as the maximum size of a short message when
communicating via shared memory.  Default is 8 KB.

--without-shortcircuit

     Disable the send/receive short circuiting optimization. The short
circuit optimization has proven to be fairly stable, and this option
is not usually necessary.  It remains for hysterical raisins.

--with-signal=SIGNAL

     Use SIGNAL as the signal used internally by LAM. The default
value is "SIGUSR2". To set the signal to "SIGUSR1" for example,
specify --with-signal=SIGUSR1.

--with-tcp-short=BYTES

     Use BYTES as the maximum size of a short message when
communicating over TCP.  Default is 64 KB.  This is relevant to all
RPIs, since the shared memory RPIs are multi-protocol -- they will use
TCP when communicating with MPI ranks that are not in the same node.

--with-thread

     This option is not yet supported.  Do not use it.

--with-trillium

     Build and install the Trillium support executables, header files,
and man pages.  These extra Trillium executables, header files, and
man pages are not necessary for normal MPI operation; they are
intended for Trillium developers and certain third party products that
interact with the lower layer of LAM/MPI.  Building XMPI
(http://www.lam-mpi.org/software/xmpi/), for example, requires that
all the Trillium header files were previously installed.  Hence, if
you intend to compile XMPI after installing LAM/MPI, you should use
this option.

     Building the extra Trillium executables and installing the
Trillium header files and man pages used to be the default in prior
versions of LAM/MPI.  However, since few users actually used them, it
has been relegated to an option.


Example: 

     % ./configure --with-rpi=usysv --with-cc=/bin/cc \
         --with-cflags=-O4 -without-fc

Compile for the usysv RPI using the C compiler /bin/cc with options
-O4 and disable Fortran support.



64 bit LAM
----------

LAM has been verified as being 64 bit clean under Solaris 7, AIX
4.3.3, IRIX 6.5, and Alpha/Linux 2.2.x.  To compile LAM with the 64
bit architecture, you will likely need to add compiler and linker
flags with configure.  For example, if you are using the Solaris
Workshop 5.0 compilers on Solaris 7, you can use the following:

     % ./configure --with-cflags='-xarch=v9' --with-ldflags='-xarch=v9'

Other compilers/architectures will have their own flags to enable 64
bit compilation; consult the documentation for your compiler.  Of
course, you can also add in any debugging/optimization flags in the
cflags and ldflags strings as well.


Building LAM
------------

Once the configuration step has completed, build LAM by doing:

     % make

in the top level LAM directory. This will build the LAM binaries and
libraries within the distribution source tree.  Once they have
compiled properly, you can install them with:

     % make install

*** NOTE ***: Previous version of LAM included "make install" in the
default "make".  THIS IS NO LONGER TRUE.  You *must* execute "make
install" to install the LAM executables, libraries, and header files
to the location specified by the --prefix option to configure.


-- Building LAM, ROMIO, and MPI 2 C++ examples

LAM and the ROMIO and MPI-2 C++ packages all include example code that
can be built with a single top-level "make examples".  Note that the
examples can *only* be built after a successful "make install", and
$prefix/bin has been placed in your $path.

     % make examples

This will do the following (where TOPDIR is the top-level directory of
the LAM source tree):

  1. Build the LAM examples.  They are located in:

     TOPDIR/examples

  2. If LAM was configured to build the C++ examples (i.e., if you did
     not configure with --without-mpi2cpp), the MPI 2 C++ examples
     will be built.  They are located in:

     TOPDIR/mpi2c++/contrib

  3. If you configured LAM with ROMIO support (i.e., if you did not
     configure with --without-romio), the ROMIO examples will be
     built.  See the notes about ROMIO in the RELEASE_NOTES file.
     They are located in:

     TOPDIR/romio/test

Additionally, the following three commands can be used to build each
of the packages' examples separately (provided that support for each
was compiled in to LAM) from TOPDIR:

     % make lam-examples
     % make romio-examples
     % make mpi2c++-examples


Boot schema
-----------

A boot schema is a description of a multicomputer on which LAM will be
run.  You can create boot schema files (see bhost(5) for syntax) for
typical configurations of the local multicomputer(s).  Place these
files under etc/ in the installation directory.  They will be found by
LAM tools such as lamboot(1), recon(1) and wipe(1) if you do not
specify a filename on the command line to use instead of the default.

The default etc/lam-bhost.def file comes with a single line:

     localhost

So that if you simply do "lamboot", you will get a LAM with one node
(the localhost) booted.

You can re-write the etc/lam-bhost.def file if you are frequently
going to boot LAM to the same configuration.  For example, if you
frequently use 4 workstations: inky, blinky, pinky, and clyde, you can
have a etc/lam-bhost.def files as follows:

     inky
     blinky
     blinky
     blinky
     blinky
     pinky cpu=2
     clyde user=lamrocks

Note that "blinky" is listed 4 times.  This tells LAM/MPI that blinky
has 4 CPUs (relevant for the "C" notation to the mpirun command; see
mpirun(1)).  An alternate (and equivalent) notation is used for pinky
-- "cpu=2" specifies that pinky has 2 CPUs.

You can also specify different remote usernames on the remote nodes;
the username "lamrocks" is used on the machine "clyde" in the above
example.


Using LAM
---------

If the LAM installation directory is moved after it is built, users
must set the LAMHOME environment variable to the new location.  This
is the *only* case where the LAMHOME environment variable should be
set -- otherwise, it should be left unset.  See "The LAMHOME and
TROLLIUSHOME environment variables", below.

On each UNIX machine, users must add the LAM executable directory to
their shell's search path.  LAM executables are found under
$prefix/bin.  These steps must be taken on each and every machine that
might be part of a multicomputer running LAM.  Set the variables in
the shell's start-up file, **not the .login file***.


--- Typical usage

LAM is a daemon-based implementation of MPI.  This means that a daemon
process is launched on each machine that will be in the parallel
environment.  Once the daemons have been launched, LAM is ready to be
used.  A typical usage scenario is as follows:

     - Boot LAM on all the nodes
     - Run MPI programs
     - Shut down LAM

LAM does not need to be booted in order to compile MPI programs.

LAM is a user-based MPI environment; each user who wishes to use LAM
must boot their own LAM environment.  LAM is not a client-server
environment where a single LAM daemon can service all LAM users on a
given machine.  There are no future plans to make LAM client-server
oriented (unless someone volunteers to write it :-).

As a side-effect of this design, each user must have an account on
each machine that they wish to use LAM on. 


--- The LAMHOME and TROLLIUSHOME environment variables

Note that it is typically *not necessary* to set the LAMHOME and/or
TROLLIUSHOME environment variables.  These variables are *only*
necessary if the $prefix of the LAM installation is moved after "make
install" was run.

As such, there are very few cases when one would need to set LAMHOME
or TROLLIUSHOME.  The LAM Team recomends that you leave these
variables unset.


--- Starting LAM

The recon(1) tool checks if LAM can be started on the given boot
schema.  There are several prerequisites that enable LAM to be started
on a remote machine:

     * The machine must be reachable and operational. 
     * The user must have an account on the machine. 
     * The user must be able to rsh(1) (or use an rsh substitute --
       see above for details on how to specify a different remote
       shell) to the machine (typically, permissions must be set in
       the user's .rhosts file on the machine).
     * The user must be able to write to /tmp.
     * The LAM executables must be locatable on that machine, using
       the shell's search path and possibly the LAMHOME environment
       variable, as described above.
     * The shell's start-up script must not print anything on standard
       error. The user can take advantage of the fact that rsh(1) will
       start the shell non-interactively. The start-up script can exit
       early in this case, before executing many commands relevant
       only to interactive sessions and likely to generate output.

*All* of these prerequisites must be met before LAM will function
properly.  If recon does not complete successfully, the "-d" option
will give verbose descriptions of what it tried to do, and suggestions
to fix the problem.

Also keep in mind that just because recon works, lamboot itself may
still fail.  This usually happens when the "hboot" program (that
lamboot invokes on remote nodes) fails for some reason.  Again, the
"-d" option to lamboot will enable extremely verbose output, and
suggest solutions to common problems.

Users should read the lam(7) manual page to get started using LAM
tools and libraries.

Additionally, the LAM Team offers a "Getting Started with LAM"
tutorial, that, although somewhat biased towards the LAM Team's
computing environment, is a good starting point to getting familiar
with LAM.

		http://www.lam-mpi.org/tutorials/lam/


--- Common filesystems

A common environment to run LAM in is a Beowulf-class or other
workstation cluster.  Simply stated, LAM can run on a group of
workstations connected by a network.  As mentioned above, there are
several prerequisites, however (the user must have an account on all
the machines, the user can rsh [or ssh, or whatever other remote shell
transport capability is desired -- see above for how to change the
underlying remote shell transport] to all the machines, etc.).

This raises the question for LAM system administrators: where to
install the LAM binaries, header files, etc.?  There are two main
choices:

1. Have a common filesystem, such as NFS, between all the machines to
be used.  Install the LAM files such that the LAM executables can be
found in the *same directory* on each node.  This will *greatly*
simplify user's .cshrc/.profile scripts -- the value of the $PATH can
be set without checking which machine the user is on.  It also
simplifies the system administrator's job; when the time comes to
patch or otherwise upgrade LAM, only one copy needs to be modified.

For example, consider a cluster of four machines: inky, blinky, pinky,
and clyde.  If the LAM binaries et al. are installed on inky's local
hard drive in the directory /home/lam, the system administrator has
two main choices:

  - mount inky:/home/lam on the remaining three machines, such that
/home/lam on all machines is effectively "the same".  That is, the
following directories all contain the LAM binaries:

     inky:/home/lam
     blinky:/home/lam
     pinky:/home/lam
     clyde:/home/lam

  - mount inky:/usr/local/src/lam-6.5.9 on *all four* machines in some
other common location, such as /home/lam (a symbolic link can be
installed on inky instead of a mount point for efficiency).  This
strategy is typically used for environments where one tree is NFS
exported, but another tree is typically used for the location of
binaries.  For example, the following directories all contain the LAM
binaries:

     inky:/home/lam
     blinky:/home/lam
     pinky:/home/lam
     clyde:/home/lam

Notice that there are the same four directories as the previous
example, but on inky, the directory is *actually* located in
/usr/local/src/lam-6.5.9.  There is a bit of a disadvantage in this
approach; each of the remote nodes have to incur NFS (or whatever
filesystem is used) delays to access the LAM directory tree.  However,
both the administration ease and low cost (relatively speaking) of
using a networked file system usually greatly outweighs the cost.


2. If you are concerned with networked filesystem costs of accessing
the LAM binaries, you can install LAM on the local hard drive of each
node in your system.  Again, it is *highly* advisable to install LAM
in the *same* directory on each node so that user's $PATH can be set
to the same value, regardless of the node that a user has logged on
to.

This approach will save some network latency of accessing the LAM
binaries, but is only used where users are very concerned about
squeezing every spare cycle out of their machines.


--- Using LAM with AFS

AFS has some peculiarities, especially with file permissions when
using rsh.  However, most sites tend to install the Transarc rsh
replacement (i.e., the one that passes tokens to the remote machine)
as the default rsh, so when you "rsh" to a remote machine (with recon
or lamboot), your AFS token will be passed to the remote LAM daemon
automatically.  If your site does not install the Transarc replacement
rsh as the default, consult the documentation on "--with-rsh" (above)
to see how to set the path to the rsh that LAM will use.

Once you use the replacement rsh, you should get a token on the other
side.  This means that your LAM daemons are running with your AFS
token, and you should be able to run any program that you wish,
including those that are not system:anyuser accessible.  You will even
be able to write into your filespace (as you would expect).

Keep in mind, however, that AFS tokens have limited lives, and will
eventually expire.  This means that your LAM daemons (and user MPI
programs) will lose their AFS permissions after some specified time
unless you renew your token (with the "klog" command, for example) on
the originating machine before the token runs out.  This can play
havoc with long-running MPI programs that periodically write out file
results; if you lose your AFS token in the middle of a run, and your
program tries to write out to a file, it won't have permission to,
which may cause Bad Things to happen.

If you need to run long MPI jobs with LAM on AFS, it is usually
advisable to ask your AFS administrator to increase your default token
life time to a large value, such as 2 weeks.


--- Using LAM with ssh

Note that you can change the remote transport agent that LAM uses to
spawn the LAM daemons.  While rsh is the default, it can be changed to
other agents, such as ssh.  

ssh is a popular choice because of the added security that it provides
over the .rhosts security provided by rsh.  And since ssh can pass AFS
tokens, it presents an attractive, highly secure, yet
fully-AFS-authenticated method, for invoking LAM.

If you choose to use ssh, the 1.x series of ssh will require the use
of the "-x" command line flag to prevent ssh from printing its
standard banner information to stderr.  lamboot/recon/etc. interprets
information on stderr to mean that a remote invocation has failed;
ssh's "-x" will prevent this.  (We do not have access to SSH 2.x
clients -- they may require a similar command line flag).

Note that using ssh (or any other agent) only changes the way that LAM
is *invoked*.  Once LAM is invoked, it sets up its own sockets for
communication that are outside of ssh (and are therefore not
encrypted).  ssh provides stronger security only during lamboot and
wipe.  Once the LAM daemons are launched, all MPI meta information is
passed through separate channels (such as startup of user programs)
which are independent of ssh.


Troubleshooting
---------------

--- Problems with building LAM

It is highly recommended that you execute the following steps *in
order*.  Many people have similar problems with configuration and
initial setup of LAM, and most common problems have already been
answered in one way or another.

1. Check the LAM FAQ:

     http://www.lam-mpi.org/faq/

2. Check the mailing list archives.  Use the "search" features to
check old posts and see if others have asked the same question and
had it answered:

     http://www.lam-mpi.org/MailArchives/lam/

3. If you do not find a solution to your problem in the above
resources, and your problem specifically has to do with *building*
LAM, send the following information to the LAM mailing list (see the
next section below about sending mail to the LAM mailing list):

- The result of "uname -a" on your system
- The result of "./config/config.guess" from the top-level LAM source
  directory. 
- Output from when you ran "./configure" to configure LAM
- The config.log file from the top-level LAM directory
- The share/include/lam_config.h file
- Output from when you ran "make" to build LAM

To capture the output of the configure and make steps you can use the
script command or the following technique if using a csh style shell:

     % ./configure {options} |& tee config.LOG
     % make install          |& tee make.LOG

or if using a Bourne style shell:

     % ./configure {options} 2>&1 | tee config.LOG
     % make install 2>&1          | tee make.LOG


--- The LAM/MPI Mailing Lists

There are two mailing lists: one for LAM/MPI announcements, and
another for questions and user discussion of LAM/MPI.

1. Announcement list.

This is a low-volume list that is used to announce new version of
LAM/MPI, important patches, etc.  To subscribe to the LAM announcement
list, visit its list information page (you can also use that page to
unsubscribe or change your subscription options):

     http://www.lam-mpi.org/mailman/listinfo.cgi/lam-announce

2. General discussion/user list.

This list is used for general questions and discussion of LAM/MPI.
User can post questions, comments, etc. to this list.  Due to problems
with spam, only subscribers are allowed to post to the list.  To
subscribe or unsubscribe from the list, visit the list information
page:

     http://www.lam-mpi.org/mailman/listinfo.cgi/lam

After you have subscribed (and received a confirmation e-mail), you
can send mail to the list at the following address:

     lam@lam-mpi.org

NOTE: People tend to only reply to the list; if you subscribe, post,
and then unsubscribe from the list, you will likely miss replies.

Also please be aware that lam@lam-mpi.org is a list that goes to
several hundred people around the world -- it is not uncommon to move
a high-volume exchange off the list, and only post the final
resolution of the problem/bug fix to the list.  This prevents
exchanges like "Did you try X?", "Yes, I tried X, and it did not
work.", "Did you try Y?", etc. from cluttering up peoples' inboxes.


--- Problems with running LAM and/or user programs

Check the LAM FAQ and mailing list archive resources mentioned in the
previous section (Problems with building LAM).  If you do not find the
solution to your problem there, send mail to the LAM mailing list:
lam@lam-mpi.org.

Some typical problems with rsh include the following:

     * Incorrect permissions on a user's home directory
     * Incorrect permissions on $HOME/.rhosts
     * No entry (or incorrect entry) in $HOME/.rhosts

Some typical problems with a user's environment include the following:

     * User's .cshrc/.profile does not put $prefix/bin in the path
     * Inaccessible permissions on the program that you are trying to
       run
     * Inaccessible permissions on the /tmp directory


--- Insufficient shared resources

When using the sysv or usysv RPIs, the operating system may run out of
shared memory and/or semaphores.  This is typically indicated by
failing to run an MPI program, or failing to run more than X copies of
an MPI program on a single node.

To fix this problem, your operating system settings need to be
modified to increate the allowable shared semaphores/memory.  

For Linux, teconfiguration can only be done by building a new kernel.
First modify the appropriate constants in
include/asm-<arch>/shmparam.h.  Increasing SHMMAX will allow larger
shared segments and increasing _SHM_ID_BITS allows for more shared
memory identifiers (this information is likely from 2.0 linux kernels;
it may or may not have changed in more recent versions).

For Solaris, reconfiguration can be done by modifying /etc/system and
then rebooting. See the Solaris man page system(4).
 
For example to set the maximum shared memory segment size to 32 MB
put the following in /etc/system:

     set shmsys:shminfo_shmmax=0x2000000

If you are using the sysv transport and are running out of semaphores then
the following tunables can be set.

     set semsys:seminfo_semmap=32
     set semsys:seminfo_semmni=128
     set semsys:seminfo_semmns=1024

Please consult your system documentation for help in determining the
correct values for your systems.


Clearing disk space
-------------------

After LAM has been built, all of the objects can be removed by running
the make(1) utility with the "clean" target in the source directory.

     % make clean 

NOTE: If you are using a really picky version of make (such as
OpenBSD's make), you may need to use "make -i clean".

If you're *really* desperate for more space, a bit more space can be
reclaimed by running:

     % make distclean

NOTE: Again, if you are using a really picky version of make (such as
OpenBSD's make), you may need to use "make -i distclean".

If further space is required, the entire source directory can be taken
off-line (indeed, "make distclean" returns the LAM source tree to the
same state as it was when it was unpacked from the original
distribution tarball).  Only the installation directory need be
maintained on-line.


Tuning LAM
----------

There are various constants defined in the LAM header files which
relate to message transfer protocols, shared memory allocation, and so
on.  Some of these are configurable via the configure script; it is
hoped that in time, more and more options will be configurable.

This section is intended to describe some of these constants so that
LAM users can experiment with tuning the MPI library.  It also
provides some description of the transport layer internals which may
help LAM users better understand the behavior and performance they see
from the LAM MPI library.


--- Short/long protocol

LAM MPI uses a short/long message protocol. If a message is "short",
it is sent together with a header in one transfer to the destination
process.  If the message is "long", then a header (possibly with some
data) is sent to the destination.  The sending process then waits for
an acknowledgment from the receiver before sending the rest of the
message data.  The receiving process sends the acknowledgment when a
matching receive is posted.

The crossover point from "short" to "long" message is configurable in
each transport.  See the transport specific section tcp, sysv, or
usysv for further information.


-- Shortcircuit send/receive

Typically, when a message is sent or received, LAM creates a request
structure, fills it with information about the message, links the
request into a list of messages, and calls a progression "engine" to
effect the data transfer.

When there are no active requests and a blocking (standard mode) send
or receive is done, the overhead of creating the request and linking it
into the list can be bypassed (shortcircuited) and the progression
"engine" called directly to effect the transfer.

In prior versions of LAM/MPI, this option was not the default.  It is
now used by default, unless specifically disabled via the configure
script.

--- TCP transport

The crossover point from "short" to "long" message is configurable via
the constant TCPSHORTMSGLEN in share/include/lam_config.h (relative to
the top of the LAM build tree).  It can also be set from the configure
script via the --with-tcp-short option.  The default is 64KB.

This number is relevant to all the RPIs.  The shared memory RPIs are
multi-protocol; they will use LAM/MPI use TCP to communicate with
ranks that are not on the same node.


--- Usysv and sysv transports

Descriptions of the usysv and sysv transports can be found in the "RPI
transport layers" section of the RELEASE_NOTES file.

Configuration constants for the usysv and sysv transports are found in
share/include/rpi.shm.h (from the top of the LAM build directory).

In these transports, processes on different nodes communicate via TCP
sockets.  The crossover point from "short" to "long" messages for
these communications is configurable via the constant TCPSHORTMSGLEN.
It can also be set from the configure script via the --with-tcp-short
option.  The default is 64KB.

Processes located on the same node communicate via shared memory.  The
transport allocates one SYSV shared segment shared by all processes in
the tasks which are on the node.  This segment is logically divided
into two areas.

The "postbox" area contains postboxes for "short" message
communication.  A postbox is used for communication one-way between
two processes.  The space allocated per postbox is SHMSHORTMSGLEN +
CACHELINESIZE.  SHMSHORTMSGLEN is configurable (via the configure
option --with-shm-short).  It is the the crossover point from "short"
to "long" messages in shared memory communication; the default value
is 8 KB.

CACHELINESIZE must be the size of a cache line or a multiple thereof.
The default setting is 64 bytes.  You shouldn't need to change it.
CACHELINESIZE bytes in the postbox are used for a cache-line sized
synchronization location.

The size of the postbox area is np (np-1) (SHMSHORTMSGLEN +
CACHELINESIZE) bytes.

The rest of the shared memory area is used as a global pool from which
space for long message transfers is allocated.  Allocation from this
pool is locked.  The default lock mechanism is a SYSV semaphore but
the configure option --with-pthread-lock can be used to change this to
a process shared pthread mutex lock.  The size of this pool is
configurable via the constant LAM_MPI_SHMPOOLSIZE, and by the configure
option --with-shm-poolsize.

The configure script will try to determine a size for the pool if none
is explicitly specified.  You should always check this to see if it is
reasonable.  Larger values should improve performance especially when
an application passes large messages, but will also increase the
system resources used by each task.

The total size of the shared segment allocated is 2 CACHELINESIZE +
LAM_MPI_SHMPOOLSIZE + np (np-1) (SHMSHORTMSGLEN + CACHELINESIZE).  The
2 CACHELINESIZE bytes are for the global pool lock.


--- Use of the global pool

When a message larger than 2 SHMSHORTMSGLEN is sent, the transport
sends SHMSHORTMSGLEN bytes with the first packet.  When the
acknowledgment is received, it allocates (message length -
SHMSHORTMSGLEN) bytes from the global pool to transfer the rest of the
message.

To prevent a single large message transfer from monopolizing the
global pool, allocations from the pool are actually restricted to a
maximum of LAM_MPI_SHMMAXALLOC bytes.  Even with this restriction, it
is possible for the global pool to temporarily become exhausted.  In
this case, the transport will fall back to using the postbox area to
transfer the message.  Performance will be degraded, but the
application will progress.

LAM_MPI_SHMMAXALLOC is configurable via the configure option
--with-shm-maxalloc or editing rpi.shm.h.


--- Synchronization

The usysv and sysv transports differ only in the mechanism used to
synchronize the transfer of messages via shared memory.  The usysv
transport uses spin locks with back-off, while the sysv transport uses
SYSV semaphores.

Both transports use a few SYSV semaphores for synchronizing the
deallocation of shared structures or for synchronizing access to the
shared pool.

The usysv transport should be superior to the sysv transport on
multiprocessors.  On uniprocessors, which is better depends on the OS
and the means used for processor yielding.  On a Linux uniprocessor,
for example, using semaphores (sysv transport) appears to be vastly
superior to spin-locking.


--- Usysv transport spin-locks

The usysv transport uses spin locks with back-off.  When a process
backs off, it attempts to yield the processor.  If the configure
script found a system provided yield function such as yield() or
sched_yield(), this is used. If no such function is found, then
select() on NULL file descriptor sets with a timeout of 10us is used.

The use of select() to yield can be forced by the --with-select-yield
option to the configure script.


--- Sysv transport semaphores

The sysv transport allocates a semaphore set (of size 6) for each
process pair communicating via shared memory.  On some systems, you
may need to reconfigure the system to allow for more semaphore sets if
running tasks with many processes communicating via shared memory.
