welcome: please sign in
location: Diff for "Cluster/Usage"
Differences between revisions 67 and 85 (spanning 18 versions)
Revision 67 as of 2012-03-02 09:21:34
Size: 24689
Editor: enol
Comment:
Revision 85 as of 2017-02-17 08:58:20
Size: 6968
Editor: aloga
Comment:
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
If you find any information that is out-dated, incorrect or incomplete, do not hesitate to [[#Support|Open a ticket]]. If you find any information that is out-dated, incorrect or incomplete, do not
hesitate to [[#Support|Open a ticket]].
Line 10: Line 11:
Line 11: Line 13:
The '''GRIDUI''' (Grid User Interface) cluster is the interactive gateway to the [[http://grid.ifca.es|Advanced Computing and e-Science]] resources at IFCA. This cluster is formed by several identical Scientific Linux 5 hosts, that can be reached through a single entry point. The connections to the internal machines are managed by a director node that tries to ensure that proper balancing is made across the available nodes at a given moment. Nevertheless, direct access to a particular node can be obtained.
Line 13: Line 14:
Please note that this cluster is ''not intended for the execution of CPU intensive tasks'', for this purpose use any of the available computing resources. Every process spawned are limited to a maximum CPU time of 2 hours. The '''GridUI''' (Grid User Interface) cluster is the interactive gateway to
the [[http://grid.ifca.es|Advanced Computing and e-Science]] resources at
IFCA. This cluster is comprised of a pool of machines reachable through a
single entry point. The connections to the internal machines are managed by
a director node that tries to ensure that proper balancing is made across
the available nodes at a given moment.
Line 15: Line 21:
Login on these machines is provided via [[http://en.wikipedia.org/wiki/Secure_Shell|Secure Shell]]. The RSA key fingerprint of them is `46:85:91:c1:eb:61:55:34:25:2c:d6:0a:08:22:1f:77`. Please note that this cluster is ''not intended for the execution of CPU
intensive tasks'', for this purpose use any of the available computing
resources. Every process spawned is limited to a maximum CPU time of 2 hours.
Line 17: Line 25:
Outgoing SSH connections are not allowed by default from this cluster. Inactive SSH sessions will be closed after 12h. Login on these machines is provided via
[[http://en.wikipedia.org/wiki/Secure_Shell|Secure Shell]]. Outgoing SSH
connections are not allowed by default from this cluster. Inactive SSH
sessions may be closed after 12h. It is highly recommended that you set up
[[Cluster/Usage/SSHKeyManagement| SSH Keys]] for authentication, instead of
using your username and password.
Line 19: Line 32:
{{{#!wiki caution
'''Direct node access'''
  || '''Hostname''' || '''Operating System''' || '''SSH server key fingerprint''' ||
  || `gridui.ifca.es`, `griduisl6.ifca.es` || Scientific Linux 6.X || `29:80:9b:28:e7:8a:00:fe:6c:60:ef:e6:a6:71:33:bd` ||
Line 22: Line 35:
Note that even though is possible to bypass the director accessing a node directly this is ''not recommended nor advisable''. Fair usage of the resources and proper balancing of the connections cannot be guaranteed if any user commits any abuse of this feature. == Authentication and user accounts ==
Line 24: Line 37:
Please note also that the GRIDUI machines have exactly the same hardware and software.
}}}
{{{#!wiki warning
'''Scientific Linux 4 cluster'''
See [[Cluster/SSO]].
Line 29: Line 39:
Note that from 3rd November 2010 no Scientific Linux Cern 4 version is available.
}}}
{{{#!rhtml
<table border="1" cellpadding="2" cellspacing="0" align="center">
<tr align=center bgcolor=#A0A0A0><td>Hostname</td><td>Port</td><td>Gives access to</td><td>Distribution and Architecture</td></tr>
<tr>
    <td><strong>gridui.ifca.es</strong></td>
    <td rowspan="2">22, 22000</td>
    <td rowspan="2">Balanced GRIDUI Cluster</td>
    <td rowspan="2" align="center">Scientific Linux CERN SLC release 5.5 (Boron), x86_64</td>
</tr>
<tr>
    <td><strong>griduisl5.ifca.es</strong></td>
</tr>
<tr>
    <td><strong>gridui.ifca.es</strong></td>
    <td rowspan="2">22001</td>
    <td rowspan="2">gridui01.ifca.es</td>
    <td rowspan="2" align="center">Scientific Linux CERN SLC release 5.5 (Boron), x86_64</td>
</tr>
<tr>
    <td><strong>griduisl5.ifca.es</strong></td>
</tr>
<tr>
    <td><strong>gridui.ifca.es</strong></td>
    <td rowspan="2">22002</td>
    <td rowspan="2">gridui02.ifca.es</td>
    <td rowspan="2" align="center">Scientific Linux CERN SLC release 5.5 (Boron), x86_64</td>
</tr>
<tr>
    <td><strong>griduisl5.ifca.es</strong></td>
</tr>
<tr>
    <td><strong>gridui.ifca.es</strong></td>
    <td rowspan="2">22003</td>
    <td rowspan="2">gridui03.ifca.es</td>
    <td rowspan="2" align="center">Scientific Linux CERN SLC release 5.5 (Boron), x86_64</td>
</tr>
<tr>
    <td><strong>griduisl5.ifca.es</strong></td>
</tr>
<tr>
    <td><strong>gridui.ifca.es</strong></td>
    <td rowspan="2">22004</td>
    <td rowspan="2">gridui04.ifca.es</td>
    <td rowspan="2" align="center">Scientific Linux CERN SLC release 5.5 (Boron), x86_64</td>
</tr>
<tr>
    <td><strong>griduisl5.ifca.es</strong></td>
</tr>
<tr>
    <td><strong>gridui.ifca.es</strong></td>
    <td rowspan="2">22005</td>
    <td rowspan="2">gridui05.ifca.es</td>
    <td rowspan="2" align="center">Scientific Linux CERN SLC release 5.5 (Boron), x86_64</td>
</tr>
<tr>
    <td><strong>griduisl5.ifca.es</strong></td>
</tr>
<tr>
    <td><strong>gridui.ifca.es</strong></td>
    <td rowspan="2">22006</td>
    <td rowspan="2">gridui06.ifca.es</td>
    <td rowspan="2" align="center">Scientific Linux CERN SLC release 5.5 (Boron), x86_64</td>
</tr>
<tr>
    <td><strong>griduisl5.ifca.es</strong></td>
</tr>
</table>
}}}
== Authentication ==
Authentication is centralized via secured LDAP. All the changes made to a user account in one node take immediate effect in the whole cluster. There is also a secured web interface, that allows a user to change his/her details, available at https://cerbero.ifca.es/. If you need to reset your account password, please contact the system administrators.
== Access to Scientific Linux 5 machines ==
Line 102: Line 41:
With this username you should be able to access also the ticketing system at http://support.ifca.es/

== Grid resources ==
As the several virtual organizations supported use its own set of resources, it is commonly required to set up the correct environment variables in order to use the grid tools. Environment scripts are located under `/nfs4/usr/etc/env/`:
After the
[[https://grid.ifca.es/sl5-user-interfaces-deprecation-plan2.html|Scientific
Linux 5 deprecation]] interactive access to Scientific Linux 5 is still
possible trough the batch system. In order to request a SLC5 machine you must
append the complex `scientificlinux5` to your request:
Line 108: Line 48:
$ source /nfs4/usr/etc/env/ibergrid-env.sh
user@cloudprv-10-0:~ $ qsub -l scientificlinux5=true (...)
Line 110: Line 51:
== PBS Cluster ==
PBS cluster was '''decommissioned''' in December 2009. Please use the new [[#SGE_Cluster|SGE Cluster]] instead.

If you want an interactive session, append the complex to your `qlogin` request:

{{{#!highlight console numbers=disable

user@cloudprv-10-0:~ $ qlogin -l scientificlinux5=true (...)
JSV "/nfs4/opt/gridengine/util/resources/jsv/jsv-IFCA.tcl" has been started
JSV "/nfs4/opt/gridengine/util/resources/jsv/jsv-IFCA.tcl" has been stopped
Your job 1822278 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 1822278 has been successfully scheduled.
Establishing builtin session to host cloudprv-02-9.ifca.es ...
user@cloudprv-02-9:~$ cat /etc/redhat-release
Scientific Linux SL release 5.5 (Boron)
user@cloudprv-02-9:~$
}}}
Line 114: Line 69:
The SGE Cluster is based on `Scientific Linux CERN SLC release 5.5` machines, running on x86_64. The exact number or resources available is shown on the [[http://monitor.ifca.es/ganglia/?c=SGE%20Worker%20Nodes&m=&r=hour&s=descending&hc=4|monitorization page]].
Line 116: Line 70:
Local submission is allowed for certain projects. As stated below, there are some shared areas that can be accessed from the computing nodes. The underlying batch system is [[http://wikis.sun.com/display/GridEngine|Sun Grid Engine]] 6.2u5. Please note that the syntax for job submission (`qsub`) and monitoring (`qstat`) is similar to the one you might be accustomed to from PBS, but there are important differences. Refer to the following sources for information: The SGE Cluster is based on `Scientific Linux CERN SLC release 6.2` machines, running on x86_64.
Line 118: Line 72:
 * Using Grid Engine [[http://wikis.sun.com/display/GridEngine/Using+Sun+Grid+Engine|official documentation]]. Local submission is allowed for certain projects. As stated below, there are some shared areas that can be accessed from the computing nodes. The underlying batch system is [[https://arc.liv.ac.uk/trac/SGE|Son of Grid Engine]] 8.0.0d. Refer to the following sources for information:

 * Using [[GridEngine]].
Line 122: Line 78:
For the shake of clarity, in the following examples we will only show the options relevant to that example. However, you should take into account that there are some options that are mandatory (for example, the project submission). {{{#!wiki important
'''IFCA Gridengine documentation has moved'''
Line 124: Line 81:
=== Job submission, projects and queues ===
Users should submit their jobs directly to their project by using `qsub -P <project>`.

{{{#!highlight console numbers=disable
$ qsub -P <project> <jobfile>
}}}
You can get the list of '''projects''' you are '''allowed''' to use by issuing:

{{{#!highlight console numbers=disable
$ /nfs4/usr/bin/rep-sge-prjs.py
}}}
{{{#!wiki warning
'''Job submission without specifying a project is not allowed'''

You should contact your supervisor if you are unsure about your project.
}}}
A scratch area is defined for every job as the environment variable `$TMPDIR`. This area is cleared after the job has exited.

{{{#!wiki caution
'''Do not specify any queues'''

Although it is possible to specify a queue in your job submission, it is not recommended to do so. Access to certain queues is possible only if the user and/or project has special privileges, so if you make a hard request for a given queue, your job surely won't be properly scheduled (it could even starve).
}}}
=== Notifications ===
Users can get email notifications whenever a job changes its status. A valid email address must be specified, using the `-M` option, as well as the kind of notification using the `-m` option. Valid values are:

 * `b`: Mail is sent at the beginning of the job.
 * `e`: Mail is sent at the end of the job.
 * `a`: Mail is sent when the job is aborted or rescheduled.
 * `s`: Mail is sent when the job is suspended.
 * `n`: No mail is sent.

Please note thas `-m` can be specified several times, so the following job sumission

 . {{{
$ qsub -m b -m e -M user@example.org <jobfile>
The specific documentation for IFCA has been moved [[/GridEngine|to a separate section]].
Line 162: Line 84:
will produce an email when the job has started and ended. == Shared areas ==
Line 164: Line 86:
=== Specifying resources ===
In order to get your submitted jobs executed fast (and to benefit from any possible backfilling), you should tweak some of the resources that your job is requesting. The more accurate you are to your job bounds, the faster your job will run (the default values are quite high in order to prevent jobs to be killed by the batch system, thus they penalize a lot the job execution).

Some of these limits are defined in the `$SGE_ROOT/default/common/sge_request` file. If your application is always expected to use the same values, you can override that file by creating a `$HOME/.sge_request` file. For further details, please check the `sge_request` manual page.

==== Wall Clock time ====
A default wall clock time of 72 hours is enforced by default in all jobs submitted to the cluster. Should you require a higher value, please set it by yourself by requesting a new `h_rt` value in the form `hours:minutes:seconds`. Please note that requesting a high value may impact negatively in your job scheduling and execution. Please try to be as accurate as possible when setting this required value. For example, a job requiring 22h, with a couple of "safety" one extra hours should be sent as follows:

{{{#!highlight console numbers=disable
$ qsub -l h_rt=24:00:00 <jobfile>
}}}
==== Memory management ====
When requesting memory for a job you must take into account that per-job memory is limited in the default queues to a [[http://en.wikipedia.org/wiki/Resident_set_size|Resident Set Size]] (h_rss) of 5 GB. If you need to use more memory, you should request the special resource `highmem`. Please notice that your group may not be able to request that flag by default. If you need to do so, please [[http://grid.ifca.es/wiki/Cluster/Usage#Support|open a ticket]] requesting it. Also, notice that these nodes might be overloaded by other users requesting the same flag, so use it wisely.

It is '''highly recommended''' that you tune your memory requirements to some realistic values. Special emphasis is made in the following resources:

 * `h_rss`
 * `mem_free`

===== h_rss =====
This limit refers to the '''hard resident set size limit'''. The batch system will make sure a given job does not consume more memory than the value assigned to that variable. This means that '''any job above the requested `h_rss` limit will be killed (SIGKILL) by the batch system'''. It is recommended to request this resource as a top limit for your application. If you expect your job to consume no more than a peak value of 3GB you should request those 3GB as its resident set size limit. This request '''shall not produce a penalty''' on the scheduling of your jobs.

===== mem_free =====
This refers to the free RAM necessary for the job to run. The batch system will allow jobs to run only if sufficient memory (as requested by `mem_free`) is available for them on a given node. It will subtract that amount of memory from the available resources, once the job is running. This ensures that a node with 16 GB of memory will not run jobs totaling more than 16 GB. The default value is 1.8 GB per slot. Please note that breaking the `mem_free` limit will not automatically kill your job. Its aim is just to try to ensure that your job has available the memory you requested. Also note that this value is not intended to be use to reflect the memory peaks of your job. This request will impact the scheduling of your jobs, so it is highly recommended to tune it to fit your actual application memory usage.

===== Memory usage above 5G =====
For serial jobs requiring more than 5 GB of memory, submission requesting the '''''highmem''''' flag is necessary. Using this flag, the `h_rss` limit will be unset, but the requirement tuning described above still applies. If your group is allowed to request it, and your job needs 20GB of memory, you can request it as follows:

{{{#!highlight console numbers=disable
$ qsub -l highmem,mem_free=20G <jobfile>
}}}
 * A job that might reach 30 GB and needs 20GB will be submitted as:

 . {{{#!highlight console numbers=disable
$ qsub -l highmem,h_rss=30G,mem_free=20G <jobfile>
}}}

For jobs needing more than these 5G using MPI, please refer to the [[#Parallel_jobs|Parallel job submission]] section.

===== Examples =====
 * A job that needs to have 4 GB of memory assigned to it:

 . {{{#!highlight console numbers=disable
$ qsub -l mem_free=4G <jobfile>
}}}
 * A job that might peak at 4 GB, but in its execution normally needs 3 GB:

 . {{{#!highlight console numbers=disable
$ qsub -l h_rss=4G,mem_free=3G <jobfile>
}}}
 * A job that might reach 4 GB, and also needs 4 GB:

 . {{{#!highlight console numbers=disable
$ qsub -l h_rss=4G,mem_free=4G <jobfile>
}}}

==== Infiniband ====
If you are executing MPI parallel jobs you may benefit from the Infiniband interconnection available on the nodes. In order to do so, you must request the special resource `infiniband`:

{{{#!highlight console numbers=disable
$ qsub -l infiniband <jobfile>
}}}
##==== Scratch space ====
##
##The scratch area for the jobs submitted to the cluster is located under `/tmp/` and is pointed by the `$TMPDIR` variable.
##
##By default, jobs request a 2GB scratch area. Should you need more space, please use the `scratch_space` in your resource requirements:
##
## $ qsub -l scratch_space=20G
##
##Please note that this space is dynamic. To check the current disk usage in the nodes you can issue:
##
## $ qhost -F scratch_space
##-->
=== Job reservation ===
It is possible to indicate whether a reservation for a job should be done or not using the `-R y` option. When a runnable job can not be started due to a shortage of resources a reservation can be scheduled instead. This is specially useful for parallel jobs, or jobs requesting bottleneck resources (for example, a high amount of memory), and it is useless for normal sequential jobs.

{{{#!highlight console numbers=disable
$ qsub -R y <jobfile>
}}}
=== Parallel jobs ===
Parallel jobs must be submitted to a parallel environment (pe), specifying the number of slots required. Depending on the used pe, SGE will allocate the slots in a different way.

{{{#!highlight console numbers=disable
$ qsub -pe mpi 8 <jobfile>
}}}
Please note that parallel jobs will be routed to queue ''parallel'' (see [[#Memory_management|previous section]]). Also note that access to that queue is restricted to groups having requested it beforehand.

The following parallel environments are available:

{{{#!rhtml
<table border="1" cellpadding="2" cellspacing="0" align="center">
<tr align=center bgcolor=#A0A0A0>
    <td>PE Name</td>
    <td>Node distribution</td>
</tr>
<tr>
    <td>smp</td>
    <td>All slots in just 1 node</td>
</tr>
<tr>
    <td>mpi</td>
    <td>All slots spread across available nodes.</td>
</tr>
<tr>
    <td>8mpi</td>
    <td>All slots spread across available nodes, 8 slots on each node. The number of slots requested must be multiple of 8.</td>
</tr>
</table>
}}}

The actual command line for the execution of the parallel application depends on the parallel library/framework that it uses for communication between the allocated slots. At IFCA, [[http://www.open-mpi.org/|Open MPI]] v.1.4 is configured in the default user environment and its installed at `/usr/lib64/openmpi/1.4-gcc`. Open MPI includes tight integration with the batch system, therefore the execution of applications with as many processes as allocated slots does not require any special arguments for mpiexec. The following example executes an 8 processes application:
{{{
#$ -pe mpi 8

# be sure to include the complete path or
# invoke mpiexec from the correct directory
mpiexec /path/to/your/application

# This is equivalent to:
# mpiexec -np $NSLOTS /path/to/your/application
# ($NSLOTS is defined by SGE and is the number of allocated slots)
}}}

Better control of the processes started can be achieved with several mpiexec/mpirun parameters, like `-np` which allows to set the total number of processes to start or `-npernode` which allows to fix the number of processes per available node.

Open MPI will try to use the best available communication network during runtime. In order to restrict the communication method you may use the `--mca btl` parameter of mpiexec. '''Forcing a communication network may turn your application unrunable''', Open MPI selects automatically the best communication method for you. For a list of available communication methods, use `ompi_info` command as shown:
{{{
$ ompi_info | grep btl
                 MCA btl: ofud (MCA v2.0, API v2.0, Component v1.4)
                 MCA btl: openib (MCA v2.0, API v2.0, Component v1.4)
                 MCA btl: self (MCA v2.0, API v2.0, Component v1.4)
                 MCA btl: sm (MCA v2.0, API v2.0, Component v1.4)
                 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4)
}}}

For example to avoid the use of tcp network, you could use the following command line:
{{{
# ^tcp means anything but tcp
mpiexec --mca btl ^tcp /path/to/your/application
}}}
More examples can be found in the [[http://www.open-mpi.org/faq/?category=tuning#selecting-components|Open MPI FAQ]].

[[Middleware/MpiStart|mpi-start]] is also installed at the cluster. It may be useful if you are testing more than one MPI implementation or submitting jobs via grid. Check its [[ Middleware/MpiStart/UserDocumentation|user documentation]] for more information.



=== Interactive jobs ===
Interactive, short lived and high priority jobs can be sent if your project has permission to do so (see the SUBMIT manual page (`man submit`)). This kind of jobs can only request a '''maximum of 1h of WALL clock time''', see [[#Wall_Clock_time|previous section]] for details about limiting the wall clock time of a job.

X11 forwarding is possible when using the `qlogin` command. Using X11 forwarding requires a valid DISPLAY, use `ssh -X` or `ssh -Y` to enable X11 forwarding in your ssh session when logging in the UI.

{{{#!highlight console numbers=disable
$ qlogin -P <project> -l h_rt=1:00:00
}}}
=== Pseudo-Interactive jobs ===
A special resource, called `immediate` is available for some users, that need fast scheduling for their short-lived batch jobs. This kind of jobs can only request a '''maximum of 1h of WALL clock time'''.

{{{#!highlight console numbers=disable
$ qsub -l immediate <jobfile>
}}}
Please note that you might not have access to these resources.

=== Resource quotas ===
Some limits may be enforced by the administrators in a user/group/project basis. To check the current resource quotas, the following command must be issued:

{{{#!highlight console numbers=disable
$ qconf -srqs
}}}
In order to know the current usage of the quotas defined above, the comand `qquota` must be used:

{{{#!highlight console numbers=disable
$ qquota -P <project>
}}}
=== Advanced reservation ===
Some users and/or projects might request a reservation of a set of resources in advance. This is called an "Advanced Reservation (AR). If your project needs such a reservation you should make a petition using the [[http://support.ifca.es/|support helpdesk]]. You need to specify the following:

 * Start datetime, and end datetime (or duration) of the reservation.
 * Duration of your job(s) (i.e. `h_rt` for the individual jobs).
 * Computational resources needed (mem_free, number of slots).

Once the request has been made, the system administrators will give you the ID(s) of the AR created. You can submit your jobs whenever you want by issuing:

{{{#!highlight console numbers=disable
$ qsub -ar <reservation_id> <other_job_options>
}}}
You can submit your job(s) before the AR starts and also once it is started. However, you should take care of the duration of the reservation and your job' duration. If your job execution exceeds either the `h_rt` that it has requested or the duration of the AR it will be killed by the batch system.

You should also take into account that your reservation might not be created in the date and time that you requested if there are no resources available. In this case, it will be created whenever it is possible. To avoid this, please request your reservations well in advance.

Since the requested and reserved resources cannot be used for other jobs, those requested resources will be used for accounting purposes as if they were resources used by normal jobs (even in the case that the AR is unused). '''Please request only the resources that you need'''.

If you want to query the existing advance reservations, you can use the `qrstat` command. To query about an specific advance reservation, you can issue:

{{{#!highlight console numbers=disable
$ qrstat -ar ''<reservation_id>''
}}}
== Shared areas ==
The `$HOME` directories (`/home/$USER`) are shared between the UIs and the computing nodes. There is a ''projects'' shared area (located at `/gpfs/csic_projects/`), also accessible from the UI and the computing nodes. If your group does not have this area, please open an [[http://support.ifca.es|Incidence ticket]].
The `$HOME` directories are shared between the UIs and the computing nodes. There is a ''projects'' shared area (located at `/gpfs/csic_projects/`), also accessible from the UI and the computing nodes. If your group does not have this area, please open an [[http://support.ifca.es|Incidence ticket]].
Line 365: Line 89:
Line 370: Line 95:
Line 392: Line 118:
Some extra packages as latest [[http://python.org|Python]] versions and [[http://software.intel.com/en-us/articles/non-commercial-software-development/|Intel Non-Commercial Compilers]] can be found at `/nfs4/opt/`. Here also is the preferred location for some other piece of software commonly used like:
Line 394: Line 119:
 * Matlab's like `octave` language for numerical anaylisis. Some extra packages can be found at `/nfs4/opt/`. This is the location for
some pieces of software commonly used like:

 * Matlab's like `octave` language for numerical analysis.
Line 396: Line 124:
 * Profiling and debugging `valngrid` tools.  * Profiling and debugging `valgrind` tools.
Line 401: Line 129:
Line 406: Line 135:
CategoryUserSupport CategoryLocalCluster CategoryUserSupport

IFCA Datacenter usage guidelines

If you find any information that is out-dated, incorrect or incomplete, do not hesitate to Open a ticket.

1. Introduction

The GridUI (Grid User Interface) cluster is the interactive gateway to the Advanced Computing and e-Science resources at IFCA. This cluster is comprised of a pool of machines reachable through a single entry point. The connections to the internal machines are managed by a director node that tries to ensure that proper balancing is made across the available nodes at a given moment.

Please note that this cluster is not intended for the execution of CPU intensive tasks, for this purpose use any of the available computing resources. Every process spawned is limited to a maximum CPU time of 2 hours.

Login on these machines is provided via Secure Shell. Outgoing SSH connections are not allowed by default from this cluster. Inactive SSH sessions may be closed after 12h. It is highly recommended that you set up SSH Keys for authentication, instead of using your username and password.

  • Hostname

    Operating System

    SSH server key fingerprint

    gridui.ifca.es, griduisl6.ifca.es

    Scientific Linux 6.X

    29:80:9b:28:e7:8a:00:fe:6c:60:ef:e6:a6:71:33:bd

2. Authentication and user accounts

See Cluster/SSO.

3. Access to Scientific Linux 5 machines

After the [[https://grid.ifca.es/sl5-user-interfaces-deprecation-plan2.html|Scientific Linux 5 deprecation]] interactive access to Scientific Linux 5 is still possible trough the batch system. In order to request a SLC5 machine you must append the complex scientificlinux5 to your request:

user@cloudprv-10-0:~ $ qsub -l scientificlinux5=true (...)

If you want an interactive session, append the complex to your qlogin request:

user@cloudprv-10-0:~ $ qlogin -l scientificlinux5=true (...)
JSV "/nfs4/opt/gridengine/util/resources/jsv/jsv-IFCA.tcl" has been started
JSV "/nfs4/opt/gridengine/util/resources/jsv/jsv-IFCA.tcl" has been stopped
Your job 1822278 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 1822278 has been successfully scheduled.
Establishing builtin session to host cloudprv-02-9.ifca.es ...
user@cloudprv-02-9:~$ cat /etc/redhat-release
Scientific Linux SL release 5.5 (Boron)
user@cloudprv-02-9:~$  

4. SGE Cluster

The SGE Cluster is based on Scientific Linux CERN SLC release 6.2 machines, running on x86_64.

Local submission is allowed for certain projects. As stated below, there are some shared areas that can be accessed from the computing nodes. The underlying batch system is Son of Grid Engine 8.0.0d. Refer to the following sources for information:

IFCA Gridengine documentation has moved

The specific documentation for IFCA has been moved to a separate section.

5. Shared areas

The $HOME directories are shared between the UIs and the computing nodes. There is a projects shared area (located at /gpfs/csic_projects/), also accessible from the UI and the computing nodes. If your group does not have this area, please open an Incidence ticket.

5.1. Usage

The shared directories are not intended for scratch, use the temporal areas of the local filesystems instead. In other words, instruct every job you send to copy the input from the shared directory to the local scratch ($TMPDIR), execute all operations there, then copy the output back to some shared area where you will be able to retrieve it comfortably from the UI.

As mentioned above, the contents of $TMPDIR are removed after job execution.

5.2. Disk quotas

Disk quotas are enabled on both user and projects filesystems. A message with this information should be shown upon login. If you need more quota on your user space (not in the project shared area), please contact the system administrators explaining your reasons.

If you wish to check your quota at a later time, you can use the commands mmlsquota gpfs_csic (for user quotas) and mmlsquota -g id -g gpfs_projects (for group quotas). A script reporting both quotas is located on /nfs4/usr/bin/rep-user-quotas.py. A sample output of the latter could be:

**********************************************************************
                    INFORMATION ABOUT YOUR CURRENT DISK USAGE

USER                Used      Soft      Hard     Doubt     Grace
Space (GB):         3.41     20.00      0.00      0.06      none
Files (x1000):        64         0         0         0      none

GROUP               Used      Soft      Hard     Doubt     Grace
Space (GB):         0.00   1000.00   1500.00      0.00      none
Files (x1000):         0         0         0         0      none
**********************************************************************

For a basic interpretation of this output, note that the "Used" column will tell you about how much disk space you are using, whereas "Soft" denotes the limit this "Used" amount should not exceed. The "Hard" column is the value of the limit "Used" plus "Doubt" should not cross. A healthy disk space management would require that you periodically delete unused files in your $HOME directory, keeping its usage below the limits at all times. In the event that the user exceeds a limit, a grace period will be shown in the "Grace" column. If the user does not correct the situation within the grace period, she will be banned from writing to the disk.

For further information you can read the mmlsquota command manual page.

6. Extra utils/Software

Some extra packages can be found at /nfs4/opt/. This is the location for some pieces of software commonly used like:

  • Matlab's like octave language for numerical analysis.

  • Data plotting gnuplot program.

  • Profiling and debugging valgrind tools.

Please note that these packages are provided as-is, without further support from IFCA staff.

7. Support

Before opening a new incidence, please check the Frequently Asked Questions page

Questions, support and/or feedback should be directed through the use the Helpdesk.


CategoryUserSupport

eciencia: Cluster/Usage (last edited 2017-02-17 08:58:31 by aloga)