welcome: please sign in
location: Diff for "Supercomputing/Userguide"
Differences between revisions 42 and 44 (spanning 2 versions)
Revision 42 as of 2017-03-23 07:44:18
Size: 25690
Editor: aidaph
Comment:
Revision 44 as of 2018-10-18 10:09:02
Size: 25827
Editor: aidaph
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
= Altamira User's Guide = = Altamira User's Guide (deprecated) =

Please go to this link to see the new userguide: https://confluence.ifca.es/display/IC/Altamira+Users+Guide
Line 27: Line 29:
All the nodes are connected to a global storage system based on GPFS (Global Parallel File System) providing a total of 132 TB. All the nodes are connected to a global storage system based on GPFS (Global Parallel File System) providing a total of, more or less, 2 PB.

Altamira User's Guide (deprecated)

Please go to this link to see the new userguide: https://confluence.ifca.es/display/IC/Altamira+Users+Guide

Introduction

This user's guide for the Altamira supercomputer is intended to provide the minimum amount of information needed by a new user of this system. As such, it assumes that the user is familiar with basic notions on scientific computing, in particular the basic commands of the Unix operating system, and also with basic techniques for the execution of applications in a supercomputer, like MPI or OpenMP.

The information in this guide includes basic technical documentation about the Altamira supercomputer, the software environment, and also on available applications.

Please read it carefully and if any doubt arises don't hesitate to contact our support group.

System Overview

Altamira comprises 158 main compute nodes, 5 additional GPU compute nodes, a login server and several service servers.

Main compute nodes have two Intel Sandybridge E5-2670 processors, each one with 8 cores operating at 2.6 GHz and a cache of 20MB, 64 GB of RAM memory (i.e. 4 GB/core) and 500 GB local disk.

Main compute nodes run Scientific Linux (currently 6.2 version)

The internal network in Altamira includes:

  • Infiniband Network (FDR): High bandwidth network used by parallel applications communications and data transfer.
  • Gigabit Network: Ethernet network used by the management services.

All the nodes are connected to a global storage system based on GPFS (Global Parallel File System) providing a total of, more or less, 2 PB.

The Altamira system is connected to Internet with a direct connection to RedIris Nova at 1 Gbit/s (that will be upgraded to 10 Gbit/s soon)

Requesting an account

Altamira users include researchers at the University of Cantabria, researchers who get execution time at the Spanish Supercomputing Network (RES), and other researchers. The assignment of an account and execution time requires a request form, contact us in case of doubt or for urgent requests.

Connecting to Altamira

Once you have a login and its associated password you can get into Altamira system, connecting to the login node altamira1.ifca.es.

You must use Secure Shell (ssh) tools to login into or transfer file into Altamira. We do not accept incoming connections from protocols as telnet, ftp, rlogin, rcp, or rsh commands. Once you are logged into Altamira you cannot make outgoing connections for security reasons.

To get more information about the secure shell version supported and how to get ssh for your system (including windows systems) see Appendix A.

Here you have an example of logging into Altamira from a UNIX environment:

user@localsystem:~$ ssh -l usertest altamira1.ifca.es
usertest@altamira1.ifca.es's password: 

/--------------------------------------------------------------\
|                      Welcome to Altamira                     |
|                                                              |
|  - Applications are located at /gpfs/res_apps                |
|  - For futher information read User's Guide at               |
|      http://grid.ifca.es/wiki/Ojancano/Userguide             |
|                                                              |
\--------------------------------------------------------------/

[usertest@login1 ~]$ 

The first time that you connect to the Altamira system secure shell needs to interchange some initial information to establish the communication. This information consists of the acceptance of the RSA key of the remote host, you must answer 'yes' or 'no' to confirm the acceptance of this key.

Please change your initial password after you login the first time into the machine. Also use a strong password (do not use a word or phrase from a dictionary and do not use a word that can be obviously tied to your person). Finally, please make a habit of changing your password on a regular basis.

If you cannot get access to the system after following this procedure, first consult Appendix A for an extended information about Secure Shell, or you can contact us, (see Getting Help to know how to contact with us).

Login Node

Once inside the machine you will be presented with a UNIX shell prompt and you'll normally be in your home ($HOME) directory. If you are new to UNIX, you'll have to learn the basics before you could do anything useful.

The machine in which you will be logged in will be the login node of Altamira (login1). This machine act as front end, and it is used typically for editing, compiling, preparation/submition of batch executions and as a gateway for copying data inside or outside Altamira.

It is not permitted the execution of cpu-bound programs on this node, if some compilation needs much more cputime than the permitted, this needs to be done through the batch queue system. It is not possible to connect directly to the compute nodes from the login node, all resource allocation is done by the batch queue system.

Transferring Files

As it have been said before no connections are allowed from inside Altamira to the outside world, so all scp and sftp commands have to be executed from your local machines and not inside Altamira.

Here there are some examples of each of this tools transferring files to Altamira:

localsystem$ scp localfile usertest@altamira1.ifca.es:
usertest@altamira1.ifca.es's password:

localsystem$ sftp usertest@altamira1.ifca.es
usertest@altamira1.ifca.es's password:
sftp> put localfile
sftp> exit

These are the ways to retrieve files from Altamira to your local machine:

localsystem$ scp usertest@altamira1.ifca.es:remotefile localdir
usertest@altamira1.ifca.es's password:

localsystem$ sftp usertest@mn1.bsc.es
usertest@altamira1.ifca.es's password:
sftp> get remotefile
sftp> exit

On a Windows system, most of the secure shell clients comes with a tool to make secure copies or secure ftp's. There are several tools that accomplishes the requirements, please refer to the Appendix A, where you will find the most common ones and examples of use.

File Systems

IMPORTANT It is your responsibility as a user of the Altamira system to backup all your critical data.

Each user has several areas of disk space for storing files. These areas may have size or time limits, please read carefully all this section to know about the policy of usage of each of these filesystems. There are 3 different types of storage available inside a node:

  • Root Filesystem: Is the filesystem where the operating system resides
  • GPFS Filesystems: GPFS is a distributed networked filesystem which can be accessed from all the nodes
  • Local Hard Drive: Every compute node has an internal hard drive

Root Filesystem

The root file system, where the operating system is installed in each compute node. It is NOT permitted the use of /tmp for temporary user data. The local hard drive can be used for this purpose as you could read in section about the Local Hard Drive.

Furthermore, the environment variable $TMPDIR is already configured to force the normal applications to use the local hard drive to store their temporary files.

GPFS Filesystems

The IBM General Parallel File System (GPFS) is a high-performance shared-disk file system that can provide fast, reliable data access from all blades of the cluster to a global filesystem. GPFS allows parallel applications simultaneous access to a set of files (even a single file) from any node that has the GPFS file system mounted while providing a high level of control over all file system operations. These filesystems are the recommended to use with most jobs, because GPFS provides high-performance I/O by "striping" blocks of data from individual files across multiple disks on multiple storage devices and reading/writing these blocks in parallel. In addition, GPFS can read or write large blocks of data in a single I/O operation, thereby minimizing overhead.

These are the GPFS filesystems available in Altamira from all nodes:

  • /gpfs/res_home: Soft link to GPFS folder. This filesystem has the home directories of all the users, when you log into Altamira you start in your home directory by default. Every user will have their own home directory to store the executables, own developed sources and their personal data. Quotas are in effect that limit the amount of data that can be saved here, a default quota will be enforced to all users.

If you need more disk space in this filesystem or in any other of the GPFS filesystems, the responsible of your project has to make a request for this extra space needed, specifying the requested space and the reasons why it is needed. The request can be sent by email or any other way of contact to the user support team as it is explained in Getting Help Section.

  • /gpfs/res_projects: In addition to the home directory, there is a directory in /gpfs/res_projects for each group of users of Marenostrum. For instance, the group bsc01 will have a /gpfs/res_projects/bsc01 directory ready to use. This space is intended to store data that needs to be shared between the users of the same group or project. A quota per group will be enforced depending on the space assigned by Access Comitee.

All the users of the same project will share their common /gpfs/res_projects space and it is responsibility of each project manager to determine and coordinate the better use of this space, and how it is distributed or shared between their users. If a project needs more disk space in this filesystem or in any other of the GPFS filesystems, the project manager has to make a request for this extra space needed, specifying the space needed and the reasons why it is needed. The request can be sent by email or any other way of contact to the user support team as it is explained in Getting Help Section.

  • /gpfs/res_scratch: Each Altamira user will have a directory over /gpfs/res_scratch, you must use this space to store temporary files of your jobs during its execution. By default, files may reside for up to 7 days without modification in this filesystem, any older file might be removed. A quota per group will be enforced depending on the space assigned.

  • /gpfs/res_apps: Over this filesystem will reside the applications and libraries that have already been installed on Altamira. Take a look at the directories or go to Software section to know the applications available for general use. Before installing any application that is needed by your project, first check if this application is already installed on the system. If some application that you need is not on the system, you will have to ask our user support team to install it. Check Getting Help Section how to contact with us. If it is a general application with no restrictions in his use, this will be installed over a public directory, that is over /gpfs/res_apps so all users on Altamira could make use of it. If the application needs some type of license and his use must be restricted, a private directory over /gpfs/res_apps will be created, so only the required users of Altamira could make use of this application. All applications on /gpfs/res_apps will be installed, controlled and supervised by the user support team. This doesn't mean that the users could not help in this task, both can work together to get the best result. The user support can provide his wide experience in compiling and optimizing applications in the Altamira cluster and the users can provide his knowledge of the application to be installed. All that general applications that have been modified in some way from its normal behavior by the project users' for their own study, and may not be suitable for general use, must be installed over /gpfs/res_projects or /gpfs/res_home depending on the usage scope of the application, but not over /gpfs/res_apps.

Local Hard Drive

Every node has a local hard drive that can be used as a local scratch space to store temporary files during executions of one of your jobs. This space is mounted over /scratch directory. The amount of space within the /scratch filesystem varies from node to node (depending on the total amount of disk space available). All data stored in these local hard drives at the compute nodes will not be available from the login nodes. Local hard drive data is not automatically removed, so each job should have to remove its data when finishes. The jobs should use $TMPDIR enviroment variable that is set to local scratch folder for each job.

Running Jobs

SLURM is the utility used at Altamira for batch processing support, so all jobs must be run through it. This document provides information for getting started with job execution at Altamira.

In order to keep the login nodes in a propper load, a 10 minutes limitation in the cpu time is set for processes running interactively in these nodes. Any execution taking more than this limit should be carried out through the queue system.

Submitting Jobs

A job is the execution unit for the SLURM. A job is defined by a text file containing a set of directives describing the job, and the commands to execute.

These are the basic directives to submit jobs:

  • mnsubmit <job_script> submits a job script to the queue system (see below for job script directives).

  • mnq shows all the jobs submitted.

  • mncancel <job_id> removes his/her job from the queue system, canceling the execution of the job if it was already running.

Job directives

A job must contain a series of directives to inform the batch system about the characteristics of the job. These directives appear as comments in the job script, with the following syntax:

#@ directive = value

Additionally, the job script may contain a set of commands to execute. If not, an external script must be provided with the 'executable' directive. Here you may find the most common directives:

#@ class = class_name

The queue where the job is to be submitted. Let this field empty unless you need to use "debug" or special queues.

#@ wall_clock_limit = HH:MM:SS

The limit of wall clock time. This is a mandatory field and you must set it to a value greater than the real execution time for your application and smaller than the time limits granted to the user. Notice that your job will be killed after the elapsed period.

#@ initialdir = pathname

The working directory of your job (i.e. where the job will run). If not specified, it is the current working directory at the time the job was submitted.

#@ error = file

The name of the file to collect the stderr output of the job.

#@ output = file

The name of the file to collect the standard output (stdout) of the job.

#@ total_tasks = number

The number of processes to start.

#@ cpus_per_task = number

The number of cpus allocated for each task. This is useful for hybrid MPI+OpenMP applications, where each process will spawn a number of threads. The number of cpus per task must be between 1 and 16, since each node has 16 cores (one for each thread).

#@ tasks_per_node = number

The number of tasks allocated in each node. When an application uses more than 3.8 GB of memory per process, it is not possible to have 16 processes in the same node and its 64GB of memory. It can be combined with the cpus_per_task to allocate the nodes exclusively, i.e. to allocate 2, processes per node, set both directives to 2. The number of tasks per node must be between 1 and 16.

# @ gpus_per_node = number

The number of GPU cards assigned to the job. This number can be [0,1,2] as there are 2 cards per node.

There are also a few SLURM environment variables you can use in your scripts:

Variable

Meaning

SLURM_JOBID

Specifies the job ID of the executing job

SLURM_NPROCS

Specifies the total number of processes in the job

SLURM_NNODES

Is the actual number of nodes assigned to run your job

SLURM_PROCID

Specifies the MPI rank (or relative process ID) for the current process. The range is from 0-(SLURM_NPROCS-1)

SLURM_NODEID

Specifies relative node ID of the current job. The range is from 0-(SLURM_NNODES-1)

SLURM_LOCALID

Specifies the node-local task ID for the process within a job

SLURM_NODELIST

Specifies the list of nodes on which the job is actually running

SLURM_MEM_PER_CPU

Memory available per CPU used

TMPDIR

Folder in the node to use as temporal storage (DON'T USE /tmp)

Job Examples

In the examples, the %j part in the job directives will be sustitute by the job ID.

Example for a sequential job:

   1 #!/bin/bash
   2 #@ job_name = test_serial
   3 #@ initialdir = .
   4 #@ output = serial_%j.out
   5 #@ error = serial_%j.err
   6 #@ total_tasks = 1
   7 #@ wall_clock_limit = 00:02:00
   8 
   9 ./serial_binary

Example for a parallel job:

   1 #!/bin/bash 
   2 #@ job_name = test_parallel
   3 #@ initialdir = .
   4 #@ output = mpi_%j.out
   5 #@ error = mpi_%j.err
   6 #@ total_tasks = 32
   7 #@ wall_clock_limit = 00:02:00
   8 
   9 srun ./parallel_binary 

Example for a GPGPU job:

   1 #!/bin/bash 
   2 #@ job_name = test_gpu
   3 #@ initialdir = .
   4 #@ output = gpu_%j.out
   5 #@ error = gpu_%j.err
   6 #@ total_tasks = 1
   7 #@ gpus_per_node = 1
   8 #@ wall_clock_limit = 00:02:00
   9 
  10 ./gpu_binary

The jobs with GPU should execute module load CUDA in order to set the library paths before running mnsubmit.

Software

Modules Enviroment

The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles. Each modulefile contains the information needed to configure the shell for an application or a compilation. Modules can be loaded and unloaded dynamically and atomically, in a clean fashion. All popular shells are supported, including bash, ksh, zsh, sh, csh, tcsh, as well as some scripting languages such as perl.

The most important commands of module tool are: list, avail, load, unload, switch and purge

  • module list shows all the modules you have loaded:

[usertest@login1 ~]$ module list
Currently Loaded Modulefiles:
  1) gcc/4.6.3   2) GHC/7.4.2   3) openmpi-x86_64
  • module avail shows all the modules that user is able to load:

[usertest@login1 ~]$ module avail
---------------------------- /usr/share/Modules/modulefiles --------------------------
dot         module-cvs  module-info modules     null        use.own

---------------------------------- /etc/modulefiles ----------------------------------
mvapich2-x86_64 openmpi-x86_64

------------------------------------- compilers --------------------------------------
GHC/7.4.1          GHC/7.4.2(default) gcc/4.6.3(default)

------------------------------------ applications ------------------------------------
CMAKE/2.8.7(default)      R/2.15.1(default)         SOAPdenovo/1.05(default)
MIRA/3.4.0.1(default)     SIESTA/3.1(default)       TRINITY_RNA_SEQ/r2012-06-8(default)
  • module load let user load the necessary environment variables for the selected modulefile (PATH, MANPATH, LD_LIBRARY_PATH...etc)

[usertest@login1 ~]$ module load CMAKE
load CMAKE/2.8.7 (PATH,MANPATH)
  • module unload removes all environment changes made by module load command:

[usertest@login1 ~]$ module unload GHC
remove GHC/7.4.2 (PATH,LD_LIBRARY_PATH,MANPATH)
  • module switch acts as module unload and module load command at same time:

[usertest@login1 ~]$ module load GHC
load GHC/7.4.2 (PATH,LD_LIBRARY_PATH,MANPATH)
[usertest@login1 ~]$ module switch GHC GHC/7.0.1
switch1 GHC/7.4.2 (PATH,LD_LIBRARY_PATH,MANPATH)
switch2 GHC/7.0.1 (PATH,LD_LIBRARY_PATH,MANPATH)
switch3 GHC/7.4.2 (PATH,LD_LIBRARY_PATH,MANPATH)
ModuleCmd_Switch.c(278):VERB:4: done

Job submitting with Modules

We need to do the loading of the needed applications before submitting the jobs that require them. The module load [app] command only needs to be executed once per session. E.g:

module load APP1
module load APP2
mnsubmit test_jobA.cmd
mnsubmit test_jobB.cmd
mnsubmit test_jobC.cmd

Acknowledgment in publications

Add acknowledge to IFCA at the University of Cantabria for the use of Altamira supercomputer with a text similar to:

'We acknowledge Santander Supercomputacion support group at the University of Cantabria who provided access to the supercomputer Altamira Supercomputer at the Institute of Physics of Cantabria (IFCA-CSIC), member of the Spanish Supercomputing Network, for performing simulations/analyses.'

Getting Help

IFCA provides to users consulting assistance. User support consultants are available during normal business hours, Monday to Friday, 09 a.m. to 17 p.m. (CEST time).

User questions and support are handled at:

If you need assistance, please supply us with the nature of the problem, the date and time that the problem occurred, and the location of any other relevant information, such as output or log files.

Appendices

A. SSH

SSH is a program that enables secure logins over an insecure network. It encrypts all the data passing both ways, so that if it is intercepted it cannot be read. It also replaces the old an insecure tools like telnet, rlogin, rcp, ftp,etc. SSH is a client-server software. Both machines must have ssh installed for it to work.

We have already installed a ssh server in our machines. You must have installed an ssh client in your local machine. SSH is available without charge for almost all versions of Unix. We recommend the use of OpenSSH client that can be download from http://www.openssh.org, but any client compatible with SSH version 2 can be used.

In windows systems we recommend the use of putty. It is a free SSH client that you can download from http://www.putty.nl/. But you can also, any client compatible with SSH version 2 can be used.

To transfer files to or from Altamira you need a secure ftp (sftp) o secure copy (scp) client. There are several different clients, but as previously mentioned, we recommend the use of putty clients for transferring files: psftp and pscp. You can find it at the same web page as putty ( http://www.putty.nl/ ).

Some other possible tools for users requiring graphical file transfers could be:

For using psftp you need to pass it the machine name (altamira1.ifca.es), and then the username and passwd. Once you are connected, it's like a Unix command line. With command help you will obtain a list of all possible commands. But the most useful are:

get file_name
To transfer from Altamira to your local machine.
put file_name
To transfer a file from your local machine to Altamira.
cd directory
To change remote working directory.
dir
To list contents of a remote directory.
lcd directory
To change local working directory.
!dir
To list contents of a local directory.

B. Using MVAPICH2

Applications that require MPI could choose to use MVAPICH2 implementation. It is estimated that MVAPICH2 is 20% faster than OpenMPI with a Infiniband Network.

To use MVAPICH2, you should load its enviroment with:

$ module load mvapich2-x86_64

It's required to work with Slurm, that applications that use MVAPICH2 must link with "pmi" library. This library is placed on /opt/perf/lib/ folder. E.g: Adding -L/opt/perf/lib/ -lpmi on the link step.

C. Memory Consumption of Programs

You can measure how much heap memory your program uses using Valgrind. To gather heap profiling information about the program prog, type:

module load VALGRIND
valgrind --tool=massif prog

The program will execute. All of Valgrind's profiling data is written to a file called massif.out.<pid>, where <pid> is the process ID.

To see the information gathered by Valgrind in an easy-to-read form, use ms_print. If the output file's name is massif.out.12345, type:

ms_print massif.out.25341

D. Sharing Group Files

Several groups recieve assigned time with different UNIX groups. When they want to share the data of the one of the assigned group with the other groups of the affine people, it's possible to set additional permissions using GPFS ACLs (access control list).

In order to edit the ACL permissions of a GPFS folder, you should run the command:

EDITOR=/usr/bin/vim /usr/lpp/mmfs/bin/mmedacl <folder>

And then, in the list of permissions, you should add the new group (or user if you prefer). In the example, the two last lines add the group uc09 to a folder owned by uc07.

#owner:uc07003
#group:uc07
user::rwxc
group::----
other::----
mask::rwx-
group:uc09:rwx-

E. Available Software

Current List of Available Software

eciencia: Supercomputing/Userguide (last edited 2018-10-18 10:09:02 by aidaph)