Copyright 1998-2001, University of Notre Dame.
Authors: Jeffrey M. Squyres and Arun Rodrigues with Brian Barrett,
         Kinis L. Meyer, M. D. McNally, and Andrew Lumsdaine

This file is part of the Notre Dame LAM implementation of MPI.

You should have received a copy of the License Agreement for the Notre
Dame LAM implementation of MPI along with the software; see the file
LICENSE.  If not, contact Office of Research, University of Notre
Dame, Notre Dame, IN 46556.

Permission to modify the code and to distribute modified code is
granted, provided the text of this NOTICE is retained, a notice that
the code was modified is included with the above COPYRIGHT NOTICE and
with the COPYRIGHT NOTICE in the LICENSE file, and that the LICENSE
file is distributed with the modified code.

LICENSOR MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED.
By way of example, but not limitation, Licensor MAKES NO
REPRESENTATIONS OR WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY
PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE COMPONENTS
OR DOCUMENTATION WILL NOT INFRINGE ANY PATENTS, COPYRIGHTS, TRADEMARKS
OR OTHER RIGHTS.

Additional copyrights may follow.


Mandelbrot is a simple example of the master/slave parallel
programming technique, written in C.  It runs one master process which
dynamically spawns any number of slaves.  Because the program
dynamically spawns slave processes, you only need to launch the
master.  The master writes the computed image into a Sun rasterfile
formatted file.  Try viewing it with X11/xv.

This application contains some degree of fault tolerance.  Slave
*nodes* can die and the application will continue with less slaves, as
long as one slave is alive.  If an individual slave dies, the entire
process will abort -- this example is aimed at showing that LAM/MPI
can continue if an entire node (including the LAM daemon on that node)
crashes.  To test this, try executing the 'tkill' program on a slave
node while the program is running. This will kill the LAM daemon and
slave process on that node.  (Do not run 'tkill' on the node with the
master.)

Note that this application is only an example, and is not a
full-featured fault-tolerant application.  For example, if a slave
dies, the manager does not contain any extra logic to reassign the
lost work to a different slave.  As such, the resulting output image
may contain a "hole" showing the work that would have been performed
by the dead slave.  Making the manager more robust is an exercise left
for the reader.  :-)

This feature relies on the MPI system reporting errors on MPI
functions whose communicator includes a dead slave.  Since the
application creates a separate communicator for each slave, the master
will know from a returned error which slave has died.  The application
cannot tolerate the untimely death of the master, although this could
be done with mirroring.

Use "make" to compile this example.  Make will use mpicc to compile
both programs:

        mpicc -o master master.c 
        mpicc -o slave slave.c 

To run this program, first boot LAM across your cluster with the
"lamboot" command. Then, you can run the master program on one node
with mpirun:

	mpirun n0 ./master

or you can launch "master" directly without lamboot, since this
program only needs an MPI_COMM_WORLD size of one rank:

	./master

NOTE: This example requires that the executable "slave" be available
on all nodes.
