Logo of Science Foundation Ireland  Logo of the Higher Education Authority, Ireland7 CapacitiesGPGPU Research Projects
Ireland's High-Performance Computing Centre | ICHEC
Home | News | Infrastructure | Outreach | Services | Research | Support | Education & Training | Consultancy | About Us | Login

Fionn

Contents

  1. Getting Started
  2. Environment, Applications & Development
  3. Batch Processing
  4. Performance
  5. Misc

1. Getting Started

This documentation relates to the National Service systems listed below:

If you are not currently an ICHEC user then you should visit our Services section first to determine how you would like to become a new user. Possible options are: (a) submit a project application through the Full National Service, (b) join an existing project, or (c) gain access through your institution condominium.

All use of the National Service systems is subject to the ICHEC Acceptable Usage Policy (AUP).

1.1 Logging In

When registration is complete you can log in using SSH. This will be installed by default on most Unix style systems. Windows user will need to download and install and SSH client such as openSSH, putty or MobaXterm. From the command line you can log on using the commands:

  • Fionn: ssh username@fionn.ichec.ie

If you wish to run X windows based graphical applications use the -X ssh flag.

Once you have an account you may join multiple projects subject to the approval of the project Principal Investigator.

1.2 Transferring Files

File Transfer is available via sftp

  • Fionn: sftp username@fionn.ichec.ie

scp is also available

  • Fionn: scp text.tar username@fionn.ichec.ie:

There are a number of graphical applications such as WinSCP or FileZilla which can also be used for file transfer.

1.3 The Helpdesk

The Helpdesk is the main entry point to ICHEC's support team for users. Here you can get help in using the service, find out more about ICHEC or send us your comments. If the documentation on this site does not resolve your query do not hesitate to use it to contact ICHEC.

1.4 Unix

As is generally the case ICHEC High Performance Computing (HPC) systems use the Unix style Linux operating system. An introduction to this system can be found here.

1.5 Directories

When using ICHEC systems your files will be stored in two locations:

  • Home: /ichec/home/users/username
  • Work: /ichec/work/projectname
Home will have a relatively small storage quota and should be used for personal files and source code which are related to your use of the system. It is not suited for storing large volumes for simulation results for example. Work is an area of common storage for use by all the members of a project it will normally have a much larger quota. In practice this is where the majority of files should be stored. Note that only home directories are backed up to tape; project directories under /ichec/work/projectname are NOT backed up.

1.6 Login and Compute Nodes

When you connect to the Fionn system using ssh as described above your connection will automatically be routed to one of three login nodes fionn1, fionn2, or fionn3. These nodes sometimes called frontend nodes are used by users for interactive tasks like compiling code, editing files and managing files. They are shared by users and should not be used for intensive computation.

In order to connect to Fionn it is necessary to connect from a machine with an IP address which belongs to one of ICHEC's participant institutions. Thus if you which to connect from home or while travelling you must first connect to a machine in your home institution and then connect to Fionn.

The vast majority of the Fionn system is made up of compute nodes. These nodes are used for running jobs that are submitted to system. They are sometimes referred to as backend nodes. They are dedicated to a single user at a given time and can be used for intensive long term work loads.

1.6 Backup Policy

As stated in our Acceptable Usage Policy backups are only made of user's home directories. Project directories under /ichec/work/projectname are NOT backed up. Furthermore, backups are only carried out as part of our system failure recovery plan; the restoration of user files deleted accidentally is not provided as a service.

2. Environment, Applications & Development

2.1 Modules

The large array of software packages installed means that incompatibilities are inevitable. To minimise the problems this can cause, in order to use a software package that is not part of the base operating system one must load the appropriate module(s). Loading a module generally sets environment variables such as your PATH. Modules on Fionn are categorised into different ares or "base" modules. To see what base modules are available type:

module avail

You can then load the appropriate base modules as follows:

module load apps
module load dev
Misc. Application packages
Base Module Classification Typical modules contained
dev development Compilers, MPI libraries/wrappers (e.g. Intel compilers, GCC)
apps applications Compiled applications (e.g. taskfarrm, Abaqus, OpenFOAM)
libs libraries Commonly used libraries (e.g. FFTW, GSL, HDF5, Boost)
molmodel molecular modelling Molecular modelling applications (e.g. NAMD, Gromacs)
bio bioinformatics Bioinformatics applications (e.g. NCBI-BLAST)
python python Python releases and modules (e.g. Python3, NumPy, SciPy)
phi Xeon Phi Compilers/applications specific to Intel Xeon Phi co-processors

As well as loading the necessary modules at compile time it is also required that they be present at runtime on the compute nodes. If these modules are not loaded the program is likely to crash due to not being able to find the required libraries etc. They can be loaded in two ways. You can use the PBS directive #PBS -V to import your current environment settings at submission time (this is very dangeours do not use it). Or you can add module load base_name package_name commands to the submission script itself.

Other useful module commands:

  • module unload intel/2013-sp1 (removes that module)
  • module list (lists the modules you are using at the moment)

For more information on modules see: Using Modules or type man module. Note: the version of a package needs to be stated explicitly when loading/unloading.

2.2 Compilers

Both the GNU and Intel compiler suites are available on Fionn.

The gnu suite is available by default and to use the Intel compilers one must load the relevant modules (intel/<version> on Fionn). In general the Intel compilers should give better performance and are recommended.

  Intel Compilers MPI wrappers around Intel compilers
C icc Fionn: mpiicc
C++ icpc Fionn: mpiicpc
Fortran 77 ifort Fionn: mpiifort
Fortran 90 ifort Fionn: mpiifort
OpenMP yes -
Intel compilers on Fionn.
  GNU Compilers MPI wrappers around GNU compilers
C gcc mpicc
C++ g++ mpicxx
Fortran 77 gfortran mpif77
Fortran 90 gfortran mpif90
OpenMP yes -
GNU compilers on Fionn.

2.3 OpenMP

When using OpenMP on Fionn you need to be aware that Hyperthreading is enabled by default. This means that each physical core can appear as two logical cores. Thus by default an OpenMP program will typically try to use 48 threads rather than 24 as one might expect. Typical HPC workloads will not benefit from over subscribing the physical cores unless the code is constrained by I/O.

The environment variable OMP_NUM_THREADS is normally used to control how many threads an OpenMP program will use. It can be set in the PBS job script prior to launching the program as follows:

export OMP_NUM_THREADS=24

Further information can be found in Intel's OpenMP documentation.

2.4 MPI

There are a number MPI libraries available and sometimes it is preferable to use one rather than another however unless you have a specific reason to do so it is recommended to use the default libraries.

For the Fionn systems, the MPI librarie comes embedded into the intel package module:

  • module load intel/<version> For use with both the Intel and the GNU compiler suites.

These modules provide support for MPI2 and the Infiniband based networking used on the specific machine. They also provide the compiler wrapper scripts listed in the tables in the previous section which greatly simplify compiling and linking MPI based codes.

To run a MPI job, the mpiexec command is used in a job submission script e.g. mpiexec ./my_prog my_args.

Again there is a man page for mpiexec for more details. See also Batch Processing below.

2.5 Mixing MPI and OpenMP

When you run a program where MPI and OpenMP parallelisation strategies are mixed, the way processes and threads are attached (pinned) to the physical cores, an already very important point generally speaking, becomes even more crucial to achieve good performance. But before we see how the pinning is done, let's see how cores and memory are organised with a simple diagrams.

Fionn Architecture

Fionn system architecture

On Fionn, Intel MPI will always try to attach each process to as many available cores at once as possible, while keeping the set of cores each process is attached to as constrained as possible, as per the following examples:

  • For 24 MPI processes per node: process 0 will be attached to core 0, process 1 to core 1, ..., process 23 to core 23;
  • For 12 MPI processes per node: process 0 will be attached to cores 0 and 1, process 1 to cores 2 and 3, ..., process 11 to cores 22 and 23;
  • For 6 MPI processes per node: process 0 will be attached to cores 0, 1, 2 and 3, ..., process 5 to cores 20, 21, 22 and 23;
  • ...

Most of the time, this default attachment will be the most suitable one for you since it will maximise the memory locality between the various threads of the same MPI process. You shouldn't therefore have to adjust it. However, should you require a different pinning schema, you can adjust the I_MPI_PIN_DOMAIN environment variable as explained here. Finally, you should only have to export the OMP_NUM_THREAD environment variable to the value you need in your PBS script, and you mpiexec command line should look like the following one:

mpiexec -ppn $(( 24 / $OMP_NUM_THREADS )) my_program my_arguments

2.6 MKL

The Intel Math Kernel Library (MKL) is a very useful package. It provides optimised and documented versions of a large number of common mathematical routines. It supports both C and Fortran interfaces for most of these. It features the following routines:

  • Basic Linear Algebra Subprograms (BLAS); vector, matrix-vector, matrix-matrix operations.
  • Sparse BLAS Levels 1, 2, and 3.
  • LAPACK routines for linear equations, least squares, eigenvalue, singular value problems and Sylvester's equations problems.
  • ScaLAPACK Routines.
  • PBLAS routines for distributed vector, matrix-vector and matrix-matrix operation.
  • Direct and iterative sparse solver routines.
  • Vector Mathematical Library (VML) for computing mathematical functions on vector arguments.
  • Vector Statistical Library (VSL) for generating pseudorandom numbers and for performing convolution and correlation.
  • General Fast Fourier Transform (FFT) functions for fast computation of Discrete FFTs.
  • Cluster FFT functions.
  • Basic Linear Algebra Communication Subprograms (BLACS)
  • GNU multiple precision arithmetic library.

If your code depends on standard libraries such as BLAS or LAPACK, it is recommended that you link against the MKL versions for optimal performance.

Parallelism in a program can be achieved at the process level as in most MPI development or at the thread level as in OpenMP development, or in some mix of these approaches, a so-called hybrid code. The most common mode of development on our systems is MPI based, as this allows you write programs which can run across many nodes. Often such codes will want to call routines provided by MKL. However many of these routines are themselves parallel so at the node level one is left with two levels of parallelism contenting with one and other. To eliminate this the MKL module sets the environment variable MKL_NUM_THREADS=1. If you are writing hybrid code or pure OpenMP code that uses MKL you may need to override this setting. Chapter 6 of the MKL userguide explains in detail how this and other related environment variables can be used. Note if you have used a version of MKL older than 10.0 you should be aware that MKL's method for controlling thread numbers has changed.

This issue can also be addressed by explicitly linking the sequential version of the libraries which can be found in the $MKLROOT/lib/intel64 directory and are identified by a _sequential in the name. Note you are also required to link the pthread library.

Extensive high quality MKL documentation can be found in the $MKLROOT/doc. Remember that when a code is linked against MKL it will be necessary for you to have the MKL module loaded via the submit script when running the code. Further ICHEC documentation on MKL can be found here.

3. Batch Processing

To try to utilise compute resources in a fair and efficient manner, all compute jobs must be run through the batch queueing system. The system supports three main classes of jobs:

  • Production jobs - These are day to day production jobs which potentially run for long periods over large numbers of cores.
  • Development jobs - Development jobs are generally of short duration over a limited number of cores and are typically used for testing and developing while modifying code.
  • Interactive development jobs - Such jobs have the same purpose as regular development jobs however when the submission takes place you are given a command prompt on one of the allocated backend nodes from where you can run commands interactively, much as you would were a queueing system not in place.

By specifying how many processor cores you need and for how long, the system can mix and match resource timeslots with jobs from multiple users. The most common operations you will need to perform with the batch system are submitting jobs, monitoring the queues or canceling your jobs.

As detailed in the next section it is straightforward to submit jobs to a specific queue. However in general allowing the system to decide which queue to use will give the best results except in cases where you will need to use the hybrid or shared memory partition on Fionn. This decision is based on the requested walltime and the number of cores requested. Hence it is your interest to try to provide a reasonably accurate walltime.

3.1 Sample PBS script for Fionn

Before submitting a job you normally prepare a PBS script.

#!/bin/bash
#PBS -l nodes=2:ppn=24
#PBS -l walltime=1:00:00
#PBS -N my_job_name
#PBS -A project_name
#PBS -r n
#PBS -j oe
#PBS -m bea
#PBS -M me@my_email.ie

cd $PBS_O_WORKDIR
module load libs intel-runtime/2013-sp1
mpiexec ./my_prog my_args

The # symbol is required at the start of each PBS directive. The line #PBS -l nodes=2:ppn=24 requests 48 processor cores in this case i.e. 2 nodes each of which have 24 cores. As each Fionn node has 24 cores this ppn figure will be fixed. Note the value for ppn on the Fionn hybrid partition is 20 and 8 on the shared memory component.

The line #PBS -l walltime=1:00:00 requests a walltime of 1 hour. If the job does not complete before this time the system will kill it. #PBS -N my_job_name sets the job name as it will appear in the queue.

The project_name is used to associate core hours used with a given project. You may only specify projects you are a member of. #PBS -r n indicates that the job should not automatically be rerun if it fails. #PBS -j oe joins the output and error streams into a single file. To receive a mail at the address specified with -M when a job begins, ends or aborts use #PBS -m bea.

Note, don't specify a project_name until accounting is turned on in January 2014.

At this point we change to the working directory and start the job using mpiexec. If the job is solely based on OpenMP and so runs on one node you do not use mpiexec.

3.2 Submitting Jobs

You can choose to explicitly send your job to a given queue using the #PBS -q directive or the qsub -q command.

To see what queues are available use the qstat -q command. Note that not all queues listed by qstat -q are available to users and that the Walltime and Node columns list the maximum runtime and node count for jobs in that queue. Access to the hybrid and shared memory partition is granted on a per project basis during the application process or through the helpdesk.

To submit a PBS script type qsub scriptname.pbs

Sometimes, for debugging purposes, it can be useful to launch a shell as a batch job and get an interactive session on compute nodes where you can see immediately what happens when launching a program. In these cases, an Interactive Job can be used. Note interactive jobs will only run in the DevQ region. For example, if I wanted to test my MPI program on 48 cores, I could request an interactive job for 30 minutes and then be given a shell on one of the 2 compute nodes

username@fionn1> module load intel-runtime/2013-sp1
username@fionn1> mpiifort -o hello -freeform hello_mpi.f
username@fionn1> qsub -I -l nodes=2:ppn=24,walltime=0:30:00 -V -A sys_test
qsub: waiting for job 28551.service1.cb3.ichec.ie to start
qsub: job 28551.service1.cb3.ichec.ie ready

username@r1i1n5:~> mpiexec ./hello
node 2 :Hello, world
node 3 :Hello, world
node 4 :Hello, world
node 5 :Hello, world
node 6 :Hello, world
node 7 :Hello, world
node 1 :Hello, world
node 0 :Hello, world
node 8 :Hello, world
node 9 :Hello, world
node 14 :Hello, world
node 13 :Hello, world
node 10 :Hello, world
node 15 :Hello, world
node 11 :Hello, world
node 12 :Hello, world
node 16 :Hello, world
node 20 :Hello, world
node 21 :Hello, world
node 19 :Hello, world
....
....
....
node 47 :Hello, world
username@r1i1n5:~>

Note however that this method should only be used for debugging and not for production runs as network breaks or timeouts will kill the job. Also, please exit the shell when you are no longer using the interactive session so that the resources can be released for other users.

3.3 Adding job dependency

It is possible to add a dependency between the jobs while submitting them. For example, you wish your job to run after finishing a particular job or you want a job to run only if the particular job fails. Such dependencies can be specified using -W option in the qsub command.

Syntax for specifying dependency is -W depend=dependency_list. Below are the examples to use it from both PBS script and command line.

#!/bin/bash
#PBS -l nodes=1:ppn=24
#PBS -l walltime=00:30:00
#PBS -N test_job
#PBS -A sys_test
#PBS -r n
#PBS -j oe
#PBS -W depend=afterok:11111

or from command line qsub -l nodes=2:ppn=24,walltime=0:30:00 -V -A sys_test -W depend=afterok:11111

In the above example line #PBS -W depend=afterok:11111 informs the scheduler to run a current job only if the job with id 11111 completes without any errors. It is important to note that the specified job id needs to be a valid job id i.e. it could be running, queued, hold but not the completed one. PBS will change the status of newly submitted job to 'H' (held) instead of 'Q' (queued) till it's dependency get statisfied.

There are many other attributes available could be used to specify the complex dependencies among jobs. Details list of these attributes and their usage could be found here.

3.4 Monitoring Jobs

The showq command displays information on the current status of jobs.

showq - status of jobs.

showq -w user=$USER - status of your own jobs only.

showq -w acct=myaccount - status of jobs running under specified account.

To cancel a job you should use the canceljob command.

canceljob JOBID - cancels a job

3.5 Frequently Used Batch System Commands

qsub SUBMIT_SCRIPT submit jobscript to PBS
qsub -I submit an interactive-batch job
qsub -q queue_name submit job directly to specified queue
qstat -q list all queues on system
qstat -Q list queue limits for all queues
showq list all running, queued and blocked jobs
showq -u userid list all jobs owned by user userid
showq -w acct=myaccount list all jobs using the specified project account
showq -r list all running jobs
mybalance list the balance in CPU core hours for each project you are a member
qstat -f jobid list all information known about specified job
canceljob JOBID delete job jobid
qalter JOBID modify the attributes of the job or jobs specified by jobid

3.6 OpenMP Job Submission

If you wish to run a multithreaded code on a single node which does not use MPI then you can simply call the program from the submission script without prefacing it with the mpiexec command. The job will then have access to the cores on the node. OpenMP based codes are the most common form of this type of job.

It is possible to write a so called hybrid code which uses both OpenMP and MPI. This means that a job can use shared memory within a node and MPI between a number of nodes. In this case you generally wish to allocate just one MPI process to each node. This process can then create worker threads to exploit the available cores. To do this you request the required number of nodes in the normal fashion, #PBS -l nodes=n:ppn=24. Ensuring ppn is set to 24. Then you launch the job with an additional argument, mpiexec -npernode 1 ./job my_args. With npernode set to 1, a single MPI process is allocated to each node and it is up to this process to use the available cores.

4. Performance

4.1 Monitoring Job Efficiency and Memory Usage

Users are encouraged to use the command "qutil" to investigate the performance of their running jobs.

qutil [ -u username | -j jobid,... | -a | -h ] [ -s ] - usage

For instance: qutil -a - shows all of your jobs

qutil -j 5675,5677 - lists these two jobs with ID 5675 and 5677, but only if they belong to you)

The output from qutil gives a number of useful pieces of information. For each compute node used by a job it lists the 1,5 and 15 minute load figures. These figures are a rough measure of how high the compute load on each node is. Ideally this value should be roughly equal to the number of cores in the node (24 on Fionn). Not all codes are able to fully utilise all cores all the time but if the figure is consistently low we recommend you contact us to discuss the implications and what options are open to you to improve it. The efficiency figure listed is based on these values and is normalised such that a best case figure is in the region of 1.0.

The memory utilisation per node is also listed, while this too can vary far too rapidly to be accurately represented over time by a utility like qutil many HPC codes allocate the bulk of their memory requirements at startup and only release the memory when the job completes so it can be useful. On Fionn each node has 64GB of RAM. qutil allows one to easily compare utilisation on each node in a job.

5. Misc.

5.1 Quotas

Users can check their user and project disk quotas with the lfs quota command. Once the hard quota is exceeded no more data can be written.

username@fionn1:~> lfs quota -g project_code /ichec/work/
Disk quotas for group tclif015b (gid 1110):
Filesystem kbytes quota limit grace files quota limit grace
/ichec/work/ 4731236 1048576000 1153433600 - 305 2097152 4194304 -

NOTE: The disk usage figures displayed by the quota system are based on actual disk usage not file size. There will be a minimum of 20% difference between these two figures. Where a lot of extremely small files are present this difference may be more than 100% due to partial disk block use and performance optimisation.

In order to find out how much resources (core hours) are available to your project, use the mybalance command as follows:

username@fionn1:~> mybalance
Project Machines Balance
--------- --------------- -------
icphy001 ANY 0
icphy001c ANY 0

This command will return the number of core hours available to all your projects (in the above examples, icphy001 and icphy001c). So for instance, if you wish to run a 48 core job for 24 hours, you will need to ensure that you have a minimum of 24*48=1152 core hours on your project's account.

5.2 ICHEC Training:

A number of online lectures and tutorials can be found on our website. Please check the Education & Training page for further training courses being organised by ICHEC.

5.3 References and Further Reading