welcome: please sign in
location: Diff for "Middleware/MpiStart/TroubleshootingGuide"
Differences between revisions 6 and 7
Revision 6 as of 2011-09-19 21:12:51
Size: 2878
Editor: enol
Comment:
Revision 7 as of 2011-09-19 21:13:16
Size: 2876
Editor: enol
Comment:
Deletions are marked like this. Additions are marked like this.
Line 52: Line 52:
=== `cannot find scheduler` === === cannot find scheduler ===

Troubleshooting Guide

Check the mpi-start page for more information.

Configuration

yaim plugin does not publish MPI-* tags into the RunTimeEnvironment

  1. Check that you are using the MPI_CE node type and that it is the first node type in the command line.

  2. Check the yaim log for messages like Added <FLAVOUR> to set to CE_RUNTIMEENV, if they does not appear, probably you have not enabled any flavour in your yaim profile.

Compilation

Compiler not found

While it is advised that sites supporting MPI do install the MPI compiler, it may not available at all sites. Some sites do have installed the -devel packages but do not have the proper compiler (gcc) installed. In the case of Open MPI a message like this is shown:

--------------------------------------------------------------------------
The Open MPI wrapper compiler was unable to find the specified compiler
gcc in your PATH.

Note that this compiler was either specified at configure time or in
one of several possible environment variables.
--------------------------------------------------------------------------

You should install the C/C++/Fortran compilers to fully support the compilation of MPI applications.

Incompatible Libraries

The available compiler does not match the installed libraries, or compiler paths are set incorrectly. mpi-start should fix the compiler flags (32/64 bits) and set them in the MPI_<COMPILER>_FLAGS where COMPILER is one of MPICC (C), MPICXX (C++), MPIF90 (Fortran 90) or MPIF70 (Fortran 70). Use one of those variables for your compilation.

Execution

mpiexec errors

Some sites have reported errors related to bad usage of OSC Mpiexec. Sample error messages:

error while loading shared libraries: libtorque.so.0: cannot open shared object file: No such file or directory

mpiexec: Error: PBS_JOBID not set in environment.

These errors are due to using the wrong version of Mpiexec for the installed torque, or trying to use this starter in a non PBS site.

cannot find scheduler

If mpi-start is not able to detect the batch system being used, it will issue a cannot find scheduler error message and exit with code 3. This is normally due to misconfiguration of the batch system or the Computing Element.

SGE

Support for MPI jobs in SGE requires the configuration of a Parallel Environment and enabling it for submission of jobs from the Computing Element. Current CREAM SGE support selects any parallel environment (uses -pe * option) available. If your job fails to start with a cannot find scheduler error from mpi-start, probably the parallel environment is not properly configured.

eciencia: Middleware/MpiStart/TroubleshootingGuide (last edited 2011-09-20 07:40:38 by enol)