<<TableOfContents()>>

== Description ==
mpi-start is an abstraction layer that offers a unique interface to start parallel jobs with different execution environments implementations. It provides support for different MPI implementations.

== Installation ==
Normally users do not need to install mpi-start. However if they want to use it in a site without an existing installation, the recommendation is to create a tarball installation that can be transferred in the input sandbox of the job.

In order to create a tarball installation, [[http://devel.ifca.es/hg/mpi-start|get the source code]] and do the following:
{{{
$ make tarball
}}}

This will create a mpi-start-X.Y.Z.tar.gz (with X.Y.Z being the version of mpi-start) that contains all that is needed for the execution of jobs. In your job script unpack the tarball and set the `I2G_MPI_START` environment variable to `$PWD/bin/mpi-start`.

== Usage ==
mpi-start can be controlled via environment variables or command line switches, most configuration dependent paramenters are automatically detected by mpi-start and do not need to be specified by the user. The following command line will be enough to run the application with the site defaults:

{{{
$ mpi-start application [application arguments ...]
}}}

=== Command Line Options ===

  -h :: show help message and exit
  -V :: show mpi-start version
  -t mpi_type :: use `mpi_type` as MPI implementation
  -v :: be verbose
  -vv :: include debug information
  -vvv :: include full trace
  -pre hook :: use `hook` as pre-hook file
  -post hook :: use `hook` as post-hook file
  -pcmd cmd  :: use `cmd` as pre-command
  -npnode ''n'' :: start ''n'' processes per node
  -pnode :: start 1 process per node
  -np ''n'' :: start exactly 'n' processes
  -i file :: use `file` as standard input file
  -o file :: use `file` as standard output file
  -e file :: use `file` as standard error file
  -x VAR[=VALUE] :: define variable `VAR` with optional `VALUE` for the application's environment (will not be seen by mpi-start!)
  -d VAR=VALUE :: define variable `VAR` with `VALUE`
  -- :: optional separator for application and arguments, after this, any arguments will be considered the application to run and its arguments

For example, the following command line would start /bin/hostname 3 times for available node using Open MPI:
{{{
$ mpi-start -t openmpi -npnode 3 -- /bin/hostname
}}}

=== Environment Variables ===
Prior to version 1.0.0 mpi-start only used environment variables to control its behavior. This is still possible, although command line arguments will override the environment variables defined. Next table shows the complete list of variables, with the command line options that can be used to set them:

|| '''Variable'''        || '''cmd line option''' ||  '''Meaning'''         ||
||`I2G_MPI_APPLICATION`        || || The application binary to execute. ||
||`I2G_MPI_APPLICATION_ARGS`   || || The command line parameters for the application ||
||`I2G_MPI_TYPE`               || -t || The name of the MPI implementation to use. ||
||`I2G_MPI_PRE_RUN_HOOK`       || -pre || This variable can be set to a script which must define the pre_run_hook function. This function will be called after the MPI support has been established and before the internal pre-run hooks. This hook can be used to prepare input data or compile the program. ||
||`I2G_MPI_POST_RUN_HOOK`      || -post || This variable can be set to a script which must define the post_run_hook function. This function will be called after the mpirun has finished. ||
||`I2G_MPI_START_VERBOSE`      || -v || Set to 1 to turn on the additional output.||
||`I2G_MPI_START_DEBUG`        || -vv || Set to 1 to enable debugging output ||
||`I2G_MPI_START_TRACE`        || -vvv || Set to 1 to trace every operation that is performed by mpi-start ||
||`I2G_MPI_APPLICATION_STDIN`  || -i || Standard input file to use. ||
||`I2G_MPI_APPLICATION_STDOUT` || -o || Standard output file to use. ||
||`I2G_MPI_APPLICATION_STDERR` || -e || Standard error file to use. ||
||`I2G_MPI_SINGLE_PROCESS`     || -pnode || Set it to 1 to start only one process per node. ||
||`I2G_MPI_PER_NODE`           || -npnode || Number of processes to start per node. ||
||`I2G_MPI_NP`                 || -np || Total number of processes to start.||


{{{#!wiki comment
 ||`I2G_MPI_PER_SOCKET`         || -npsocket || Number of processes to start per CPU socket. ||
 ||`I2G_MPI_PER_CORE`           || -npcore || Number of processes to start per core. ||
 ||`I2G_MPI_SINGLE_SOCKET`      || -psocket || Set it to 1 to start only one process per CPU socket.||
 ||`I2G_MPI_SINGLE_CORE`        || -pcore || Set it to 1 to start only one process per core. ||
}}}


These variables can also be set with the `-d` command line switch. The following example shows how to set the `I2G_MPI_TYPE` variable to `openmpi`:
{{{
mpi-start -d I2G_MPI_TYPE=openmpi
}}}

There are also other variables that can modify the behaviour of mpi-start, but they are described in other sections of this wiki. The ones dealing with site configuration of mpi-start are documented in the [[../SiteConfiguration|Site Administrator manual]], and the variables dealing with the Hooks are summarized in [[Middleware/MpiStart/UserDocumentation/HooksFramework|Hooks Framework]]


== Scheduler and Execution Environment Support ==

mpi-start support different combinations of batch schedulers and execution environments using plugins. The schedulers are automatically detected from the environment and the execution environment can be selected with the `I2G_MPI_TYPE` variable or the `-t` command line option. 


|||| '''Scheduler Plugins'''||
|| `sge` || supports [[http://gridengine.sunsource.net/|Grid Engine]]. ||
|| `pbs` || for supporting [[http://www.clusterresources.com/products/torque-resource-manager.php| PBS/Torque]]. ||
|| `lsf` || supports [[http://www.platform.com/Products/platform-lsf|LSF]]. ||
|| `condor` || gives support for [[http://www.cs.wisc.edu/condor/| Condor]]. This plugin lacks the possibility to select how many processes per node should be run. ||
|| `slurm` || for supporting [[https://computing.llnl.gov/linux/slurm/ | Slurm]]. As with condor, the plugin currently lacks the processes per node support. ||

|||| '''Execution Environment Plugins''' ||
|| `openmpi` || [[http://www.open-mpi.org/|Open MPI]] ||
|| `mpich2`  || [[http://www.mcs.anl.gov/research/projects/mpich2/|MPICH2]] ||
|| `mpich`   || [[http://www.mcs.anl.gov/research/projects/mpi/mpich1-old/|MPICH]] ||
|| `lam`     || [[http://www.lam-mpi.org/|LAM-MPI]] ||
|| `pacx`    || [[http://www.hlrs.de/organization/av/amt/research/pacx-mpi/|PACX-MPI]] ||
|| `dummy`   || Debugging environment, just executes application in current host. ||


== Hooks ==
The Hooks framework opens the possibility of customizing the behavior of mpi-start. Users can provide their own hooks to perform any pre (e.g. compilation of binaries, data fetching) or post (e.g. storage of application results, clean-up) actions needed for the execution of their application. The [[Middleware/MpiStart/UserDocumentation/HooksFramework|Hooks Framework]] page describes in detail the framework and how to create user hooks.

== System configuration ==
mpi-start can be configured to use the best options for the site. Check [[../SiteConfiguration|Site Administrator manual]] for more information.

== Examples ==
=== Simple Job ===

Simple job using environment variables:

{{{#!highlight sh
#!/bin/sh
# IMPORTANT : This example script execute a
#             non-mpi program with Open MPI
#
export I2G_MPI_APPLICATION=/bin/hostname
export I2G_MPI_TYPE=openmpi

$I2G_MPI_START
}}}

Same example using command line parameters:
{{{
mpi-start -t openmpi /bin/hostname
}}}

=== Job with user specified hooks ===
{{{#!highlight sh
#!/bin/sh
#
# MPI_START_SHARED_FS can be used to figure out if the current working
# is located on a shared file system or not. (1=yes, 0=no);
#
# The "mpi_start_foreach_host" function takes as parameter the name of
# another function that will be called for each host in the machine as
# first parameter.
# - For each host the callback function will be called exactly once,
#   independent how often the host appears in the machinefile.
# - The callback function will also be called for the local host.

# create the pre-run hook
cat > pre_run_hook.sh << EOF
pre_run_hook () {
    echo "pre run hook called "
    # - download data
    # - compile program

    if [ "x\$MPI_START_SHARED_FS" = "x0" ] ; then
        echo "If we need a shared file system we can return -1 to abort"
        # return -1
    fi

    return 0
}
EOF

# create the post-run hook
cat > post_run_hook.sh << EOF
# the first paramter is the name of a host in the
my_copy () {
    CMD="scp . \$1:\$PWD/mydata.1"
    echo \$CMD
    #\$CMD
    # upload data
}

post_run_hook () {
    echo "post_run_hook called"
    if [ "x\$MPI_START_SHARED_FS" = "x0" ] ; then
        echo "gather output from remote hosts"
        mpi_start_foreach_host my_copy
    fi
    return 0
}
EOF

export I2G_MPI_APPLICATION=mpi_sleep
export I2G_MPI_APPLICATION_ARGS=0
export I2G_MPI_TYPE=openmpi
export I2G_MPI_PRE_RUN_HOOK=./pre_run_hook.sh
export I2G_MPI_POST_RUN_HOOK=./post_run_hook.sh

$I2G_MPI_START

# instead of the variable definition, the following command line could be used:
# mpi-start -t openmpi -pre ./pre_run_hook.sh -post ./post_run_hook.sh mpi_sleep 0
}}}

== Using mpi-start with grid middleware ==
=== WMS ===
EMI provides the WMS service for submitting jobs to the different available resources. The WMS gets a job description in the JDL language and performs the selection and actual submission of the job into the resources on behalf of the user. The following sections describe how to submit a job using the WMS.

==== Basic Job Submission ====
Jobs are described with the [[https://edms.cern.ch/document/590869/1|JDL language]]. Most relevant attributes for parallel job submission are:

 * `CPUNumber`: number of processes to allocate.
 * `Requirements`: requirements of the job, will allow to force the selection of sites with mpi-start support.

The following example shows a job that will use 6 processes and it is executed with Open MPI. The `requirements` attribute makes the WMS to select sites that publish that they support mpi-start and Open MPI.

{{{
JobType       = "Normal";
CPUNumber     = 6;
Executable    = "starter.sh";
Arguments     = "OPENMPI hello_bin hello arguments";
InputSandbox  = {"starter.sh", "hello_bin"};
OutputSandbox = {"std.out", "std.err"};
StdOutput     = "std.out";
StdError      = "std.err";
Requirements  = member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)
                && member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment);
}}}
The `Executable` attribute is a script that will invoke mpi-start with the correct options for the execution of the user's application. We propose a generic wrapper that can be used for any application and MPI flavour that gets in the `Arguments` attribute:

 * Name of mpi-start execution environment (I2G_MPI_FLAVOUR variable), in the example: OPENMPI
 * Name of user binary, in the example: hello_bin
 * Arguments for the user binary, in the example: hello arguments

This is the content of the wrapper:

{{{#!highlight sh
#!/bin/bash
# Pull in the arguments.
MPI_FLAVOR=$1

MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER

shift
export I2G_MPI_APPLICATION=$1

shift
export I2G_MPI_APPLICATION_ARGS=$*

# Touch the executable, and make sure it's executable.
touch $I2G_MPI_APPLICATION
chmod +x $I2G_MPI_APPLICATION

# Invoke mpi-start.
$I2G_MPI_START
}}}
User needs to include this wrapper in the `InputSandbox` of the JDL (`starter.sh`) and set it as the `Executable` of the job. Submission is performed as any other job:

{{{
$ glite-wms-job-submit -a hello-mpi.sh

Connecting to the service https://gridwms01.ifca.es:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://gridwms01.ifca.es:9000/8jG3MUNRm-ol7BqhFP5Crg

==========================================================================
}}}
Once the job is finished, the output can be retrieved:

{{{
$ glite-wms-job-output https://gridwms01.ifca.es:9000/8jG3MUNRm-ol7BqhFP5Crg

Connecting to the service https://gridwms01.ifca.es:7443/glite_wms_wmproxy_server

================================================================================

                        JOB GET OUTPUT OUTCOME

Output sandbox files for the job:
https://gridwms01.ifca.es:9000/8jG3MUNRm-ol7BqhFP5Crg
have been successfully retrieved and stored in the directory:
/gpfs/csic_projects/grid/tmp/jobOutput/enol_8jG3MUNRm-ol7BqhFP5Crg

================================================================================


$ cat /gpfs/csic_projects/grid/tmp/jobOutput/enol_8jG3MUNRm-ol7BqhFP5Crg/std.*
Hello world from gcsic054wn. Process 3 of 6
Hello world from gcsic054wn. Process 1 of 6
Hello world from gcsic054wn. Process 2 of 6
Hello world from gcsic054wn. Process 0 of 6
Hello world from gcsic055wn. Process 4 of 6
Hello world from gcsic055wn. Process 5 of 6
}}}

==== Modifying mpi-start behavior ====
mpi-start behavior can be customized by setting different environment variables (see  [[#Usage|usage section]] for a complete list). If using the generic wrapper, one easy way of customizing mpi-start execution is using the `Environment` attribute of the JDL. The following JDL adds debugging to the previous example by setting the `I2G_MPI_START_VERBOSE` and `I2G_MPI_START_DEBUG` variables to 1:

{{{
JobType       = "Normal";
CPUNumber     = 6;
Executable    = "starter.sh";
Arguments     = "OPENMPI hello_bin hello arguments";
InputSandbox  = {"starter.sh", "hello_bin"};
OutputSandbox = {"std.out", "std.err"};
StdOutput     = "std.out";
StdError      = "std.err";
Requirements  = member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)
                && member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment);
Environment   = {"I2G_MPI_START_VERBOSE=1", "I2G_MPI_START_DEBUG=1"};
}}}
Use of hooks (see [[/HooksFramework|Hooks Framework]]) is also possible using this mechanism. If the user has a file with the mpi-start hooks called `hooks.sh`, the following JDL would add it to the execution (notice that the file is also added in the `InputSandbox`):

{{{
JobType       = "Normal";
CPUNumber     = 6;
Executable    = "starter.sh";
Arguments     = "OPENMPI hello_bin hello arguments";
InputSandbox  = {"starter.sh", "hello_bin", "hooks.sh"};
OutputSandbox = {"std.out", "std.err"};
StdOutput     = "std.out";
StdError      = "std.err";
Requirements  = member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)
                && member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment);
Environment   = {"I2G_MPI_PRE_RUN_HOOK=hooks.sh", "I2G_MPI_POST_RUN_HOOK=hooks.sh"};
}}}