This documentation relates to the National Service systems listed below:
If you are not currently an ICHEC user then you should visit our Services section first to determine how you would like to become a new user. Possible options are: (a) submit a project application through the Full National Service, (b) join an existing project, or (c) gain access through your institution condominium.
All use of the National Service systems is subject to the ICHEC Acceptable Usage Policy (AUP).
When registration is complete you can log in using SSH. This will be installed by default on most Unix style systems. Windows user will need to download and install and SSH client such as openSSH or putty. From the command line you can log on using the commands:
If you wish to run X windows based graphical applications use the -X ssh flag.
Once you have an account you may join multiple projects subject to the approval of the project Principal Investigator.
File Transfer is available via sftp
scp is also available
The Helpdesk is the main entry point to ICHEC's support team for users. Here you can get help in using the service, find out more about ICHEC or send us your comments. If the documentation on this site does not resolve your query do not hesitate to use it to contact ICHEC.
As is generally the case ICHEC High Performance Computing (HPC) systems use the Unix style Linux operating system. An introduction to this system can be found here.
When using ICHEC systems your files will be stored in two locations:
When you connect to the Fionn system using ssh as described above your connection will automatically be routed to one of three login nodes fionn1, fionn2, or fionn3. These nodes sometimes called frontend nodes are used by users for interactive tasks like compiling code, editing files and managing files. They are shared by users and should not be used for intensive computation.
In order to connect to Fionn it is necessary to connect from a machine with an IP address which belongs to one of ICHEC's participant institutions. Thus if you which to connect from home or while travelling you must first connect to a machine in your home institution and then connect to Fionn.
The vast majority of the Fionn system is made up of compute nodes. These nodes are used for running jobs that are submitted to system. They are sometimes referred to as backend nodes. They are dedicated to a single user at a given time and can be used for intensive long term work loads.
As stated in our Acceptable Usage Policy backups are only made of user's home directories. Project directories under /ichec/work/projectname are NOT backed up. Furthermore, backups are only carried out as part of our system failure recovery plan; the restoration of user files deleted accidentally is not provided as a service.
The large array of software packages installed means that incompatibilities are inevitable. To minimise the problems this can cause, in order to use a software package that is not part of the base operating system one must load the appropriate module(s). Loading a module generally sets environment variables such as your PATH. Modules on Fionn are categorised into different ares or "base" modules. To see what base modules are available type:
You can then load the appropriate base modules as follows:
|Misc. Application packages|
|Base Module||Classification||Typical modules contained|
|dev||development||Compilers, MPI libraries/wrappers (e.g. Intel compilers, GCC)|
|apps||applications||Compiled applications (e.g. taskfarrm, Abaqus, OpenFOAM)|
|libs||libraries||Commonly used libraries (e.g. FFTW, GSL, HDF5, Boost)|
|molmodel||molecular modelling||Molecular modelling applications (e.g. NAMD, Gromacs)|
|bio||bioinformatics||Bioinformatics applications (e.g. NCBI-BLAST)|
|python||python||Python releases and modules (e.g. Python3, NumPy, SciPy)|
|phi||Xeon Phi||Compilers/applications specific to Intel Xeon Phi co-processors|
As well as loading the necessary modules at compile time it is also required that they be present at runtime on the compute nodes. If these modules are not loaded the program is likely to crash due to not being able to find the required libraries etc. They can be loaded in two ways. You can use the PBS directive #PBS -V to import your current environment settings at submission time. Or you can add module load base_name package_name commands to the submission script itself.
Other useful module commands:
For more information on modules see: Using Modules or type man module. Note: the version of a package needs to be stated explicitly when loading/unloading.
Both the GNU and Intel compiler suites are available on Fionn and Stoney.The gnu suite is available by default and to use the Intel compilers one must load the relevant modules (intel/<version> on Fionn and intel-cc or intel-fc on Stoney). In general the Intel compilers should give better performance and are recommended.
|Intel Compilers||MPI wrappers around Intel compilers|
|C||icc||Fionn: mpiicc - Stoney: mpicc|
|C++||icpc||Fionn: mpiicpc - Stoney: mpicxx|
|Fortran 77||ifort||Fionn: mpiifort - Stoney: mpif77|
|Fortran 90||ifort||Fionn: mpiifort - Stoney: mpif90|
|GNU Compilers||MPI wrappers around GNU compilers|
When using OpenMP on Fionn you need to be aware that Hyperthreading is enabled by default. This means that each physical core can appear as two logical cores. Thus by default an OpenMP program will typically try to use 48 threads rather than 24 as one might expect. Typical HPC workloads will not benefit from over subscribing the physical cores unless the code is constrained by I/O.
The environment variable OMP_NUM_THREADS is normally used to control how many threads an OpenMP program will use. It can be set in the PBS job script prior to launching the program as follows:export OMP_NUM_THREADS=24
Hyperthreading is not supported on Stoney. Further information can be found in Intel's OpenMP documentation.
There are a number MPI libraries available and sometimes it is preferable to use one rather than another however unless you have a specific reason to do so it is recommended to use the default libraries.
For the Fionn systems, the MPI librarie comes embedded into the intel package module:
For the Stoney systems there are two MPI modules to choose between:
These modules provide support for MPI2 and the Infiniband based networking used in these machines. They also provide the compiler wrapper scripts listed in the tables in the previous section which greatly simplify compiling and linking MPI based codes.
To run a MPI job, the mpiexec command is used in a job submission script e.g. mpiexec ./my_prog my_args.Again there is a man page for mpiexec for more details. See also Batch Processing below.
When you run a program where MPI and OpenMP parallelisation strategies are mixed, the way processes and threads are attached (pinned) to the physical cores, an already very important point generally speaking, becomes even more crucial to achieve good performance. But before to see how the pinning is done, let's see how cores and memory are organised on both machines with some simple diagrams.
Fionn system architecture
Stoney system architecture
As we can see, both machines have both different numbers of cores and numbering policies. But that's not all. By default, the MPI libraries on Fionn and Stoney sequentially attaches MPI processes to cores when starting a program. However, since the recommended MPI library is Intel MPI on Fionn and is MVAPICH2 on Stoney, the default attachment is done slightly differently on both machines.
On Fionn, Intel MPI will always try to attach each process to as many available cores at once as possible, while keeping the set of cores each process is attached to as constrained as possible, as per the following examples:
Most of the time, this default attachment will be the most suitable one for you since it will maximise the memory locality between the various threads of the same MPI process. You shouldn't therefore have to adjust it. However, should you require a different pinning schema, you can adjust the I_MPI_PIN_DOMAIN environment variable as explained here. Finally, you should only have to export the OMP_NUM_THREAD environment variable to the value you need in your PBS script, and you mpiexec command line should look like the following one:mpiexec -ppn $(( 24 / $OMP_NUM_THREADS )) my_program my_arguments
On Stoney, by default MVAPICH2 will attach sequentially each process to a single core, which means that on each node the MPI process of lowest rank will be attached to core number 0. Then the process with the next rank up will be attached to core number 1, and so on until there are no MPI processes left. Hence, if you want to mix MPI and OpenMP, the first thing you have to do is to change this MPI process to core attachment, as this would be inherited by all the OpenMP threads spawned by a given MPI process. Then, all those threads would run on this single core, leading to extremely poor performance. This process to core attachment is managed through an environment variable, listing sequentially on which cores each MPI process should be attached. The default behaviour corresponds to:
The core lists corresponding to each MPI processes are separated by colons. On stoney, 3 typical attachment policies may be explored, also sorted by decreasing likelihood of efficiency:
Both environment variables have to be exported in the PBS job script and the mpiexec command line should should resemble the following:mpiexec -npernode $(( 8 / $OMP_NUM_THREADS )) my_program my_arguments
The Intel Math Kernel Library (MKL) is a very useful package. It provides optimised and documented versions of a large number of common mathematical routines. It supports both C and Fortran interfaces for most of these. It features the following routines:
If your code depends on standard libraries such as BLAS or LAPACK, it is recommended that you link against the MKL versions for optimal performance.
Parallelism in a program can be achieved at the process level as in most MPI development or at the thread level as in OpenMP development, or in some mix of these approaches, a so-called hybrid code. The most common mode of development on our systems is MPI based, as this allows you write programs which can run across many nodes. Often such codes will want to call routines provided by MKL. However many of these routines are themselves parallel so at the node level one is left with two levels of parallelism contenting with one and other. To eliminate this the MKL module sets the environment variable MKL_NUM_THREADS=1. If you are writing hybrid code or pure OpenMP code that uses MKL you may need to override this setting. Chapter 6 of the MKL userguide explains in detail how this and other related environment variables can be used. Note if you have used a version of MKL older than 10.0 you should be aware that MKL's method for controlling thread numbers has changed.
This issue can also be addressed by explicitly linking the sequential version of the libraries which can be found in the $MKLROOT/lib/intel64 directory and are identified by a _sequential in the name. Note you are also required to link the pthread library.
Extensive high quality MKL documentation can be found in the $MKLROOT/doc. Remember that when a code is linked against MKL it will be necessary for you to have the MKL module loaded via the submit script when running the code. Further ICHEC documentation on MKL can be found here.
To try to utilise compute resources in a fair and efficient manner, all compute jobs must be run through the batch queueing system. The system supports three main classes of jobs:
By specifying how many processor cores you need and for how long, the system can mix and match resource timeslots with jobs from multiple users. The most common operations you will need to perform with the batch system are submitting jobs, monitoring the queues or canceling your jobs.
As detailed in the next section it is straightforward to submit jobs to a specific queue. However in general allowing the system to decide which queue to use will give the best results except in cases where you will need to use the hybrid or shared memory partition on Fionn. This decision is based on the requested walltime and the number of cores requested. Hence it is your interest to try to provide a reasonably accurate walltime.
Before submitting a job you normally prepare a PBS script.
The # symbol is required at the start of each PBS directive. The line #PBS -l nodes=2:ppn=24 requests 48 processor cores in this case i.e. 2 nodes each of which have 24 cores. As each Fionn node has 24 cores this ppn figure will be fixed. However each Stoney nodes has 8 cores so this figure should be set to 8, and the resulting job request will be for 16 cores. Note the value for ppn on the Fionn hybrid partition is 20 and 8 on the shared memory component.
The line #PBS -l walltime=1:00:00 requests a walltime of 1 hour. If the job does not complete before this time the system will kill it. #PBS -N my_job_name sets the job name as it will appear in the queue.
The project_name is used to associate core hours used with a given project. You may only specify projects you are a member of. #PBS -r n indicates that the job should not automatically be rerun if it fails. #PBS -j oe joins the output and error streams into a single file. To receive a mail at the address specified with -M when a job begins, ends or aborts use #PBS -m bea.
Note, don't specify a project_name until accounting is turned on in January 2014.
The #PBS -V directive is very important if you do not explicitly load modules in the PBS script as it causes environment settings to be imported from the submission environment to the runtime environment. At this point we change to the working directory and start the job using mpiexec. If the job is solely based on OpenMP and so runs on one node you do not use mpiexec.
You can choose to explicitly send your job to a given queue using the #PBS -q directive or the qsub -q command.
To see what queues are available use the qstat -q command. Note that not all queues listed by qstat -q are available to users and that the Walltime and Node columns list the maximum runtime and node count for jobs in that queue. Access to the hybrid and shared memory partition is granted on a per project basis during the application process or through the helpdesk.
To submit a PBS script type qsub scriptname.pbs
Sometimes, for debugging purposes, it can be useful to launch a shell as a batch job and get an interactive session on compute nodes where you can see immediately what happens when launching a program. In these cases, an Interactive Job can be used. Note interactive jobs will only run in the DevQ region. For example, if I wanted to test my MPI program on 48 cores, I could request an interactive job for 30 minutes and then be given a shell on one of the 2 compute nodes
Note however that this method should only be used for debugging and not for production runs as network breaks or timeouts will kill the job. Also, please exit the shell when you are no longer using the interactive session so that the resources can be released for other users.
The showq command displays information on the current status of jobs.
showq - status of jobs.
showq -w user=$USER - status of your own jobs only.
showq -w acct=myaccount - status of jobs running under specified account.
To cancel a job you should use the canceljob command.
canceljob JOBID - cancels a job
|qsub SUBMIT_SCRIPT||submit jobscript to PBS|
|qsub -I||submit an interactive-batch job|
|qsub -q queue_name||submit job directly to specified queue|
|qstat -q||list all queues on system|
|qstat -Q||list queue limits for all queues|
|showq||list all running, queued and blocked jobs|
|showq -u userid||list all jobs owned by user userid|
|showq -w acct=myaccount||list all jobs using the specified project account|
|showq -r||list all running jobs|
|mybalance||list the balance in CPU core hours for each project you are a member|
|qstat -f jobid||list all information known about specified job|
|canceljob JOBID||delete job jobid|
|qalter JOBID||modify the attributes of the job or jobs specified by jobid|
If you wish to run a multithreaded code on a single node which does not use MPI then you can simply call the program from the submission script without prefacing it with the mpiexec command. The job will then have access to the cores on the node. OpenMP based codes are the most common form of this type of job.
It is possible to write a so called hybrid code which uses both OpenMP and MPI. This means that a job can use shared memory within a node and MPI between a number of nodes. In this case you generally wish to allocate just one MPI process to each node. This process can then create worker threads to exploit the available cores. To do this you request the required number of nodes in the normal fashion, #PBS -l nodes=n:ppn=24. Ensuring ppn is set to 24 or 8 in the case of Stoney. Then you launch the job with an additional argument, mpiexec -npernode 1 ./job my_args. With npernode set to 1, a single MPI process is allocated to each node and it is up to this process to use the available cores.
Users are encouraged to use the command "qutil" to investigate the performance of their running jobs.
qutil [ -u username | -j jobid,... | -a | -h ] [ -s ] - usageFor instance: qutil -a - shows all of your jobs
qutil -j 5675,5677 - lists these two jobs with ID 5675 and 5677, but only if they belong to you)
The output from qutil gives a number of useful pieces of information. For each compute node used by a job it lists the 1,5 and 15 minute load figures. These figures are a rough measure of how high the compute load on each node is. Ideally this value should be roughly equal to the number of cores in the node. So 24 on Fionn an 8 on Stoney. Not all codes are able to fully utilise all cores all the time but if the figure is consistently low we recommend you contact us to discuss the implications and what options are open to you to improve it. The efficiency figure listed is based on these values and is normalised such that a best case figure is in the region of 1.0.
The memory utilisation per node is also listed, while this too can vary far too rapidly to be accurately represented over time by a utility like qutil many HPC codes allocate the bulk of their memory requirements at startup and only release the memory when the job completes so it can be useful. On Fionn each node has 64GB of RAM and on Stoney each node has 48GB. qutil allows one to easily compare utilisation on each node in a job.
Users can check their user and project disk quotas with the lfs quota command. Once the hard quota is exceeded no more data can be written. Note that for the moment the quota command is not available on Stoney.
NOTE: The disk usage figures displayed by the quota system are based on actual disk usage not file size. There will be a minimum of 20% difference between these two figures. Where a lot of extremely small files are present this difference may be more than 100% due to partial disk block use and performance optimisation.
In order to find out how much resources (core hours) are available to your project, use the mybalance command as follows:
This command will return the number of core hours available to all your projects (in the above examples, icphy001 and icphy001c). So for instance, if you wish to run a 48 core job for 24 hours, you will need to ensure that you have a minimum of 24*48=1152 core hours on your project's account.
A number of online lectures and tutorials can be found on our website. Please check the Education & Training page for further training courses being organised by ICHEC.