Logo of Science Foundation Ireland  Logo of the Higher Education Authority, Ireland
Ireland's High-Performance Computing Centre | ICHEC
Home | News | Infrastructure | Services | Research | Support | Education & Training | Consultancy | About Us

Using the ICHEC Blue Gene System

Contents

  1. Introduction to the ICHEC Blue Gene Service
  2. Brief Hardware Overview
  3. Getting Help
  4. Backup Policy
  5. Development Environment
  6. Batch Processing
  7. Quotas
  8. Stack overflow
  9. Core Dumps
  10. Signal 7 Issue

1. Introduction to the ICHEC National Capability (Blue Gene) System

Welcome to the ICHEC National Capability Service. This service is provided using one cabinet of IBM Blue Gene/P. A cabinet of Blue Gene/L was also available however this has now been decommissioned.

2. Brief Hardware Overview

The Blue Gene/P also has 1024 nodes but in this case each has four fully cache coherent cores and 2GB RAM. These nodes provide three modes of operation: SMP, Dual and Virtual Node. Respectively, these give you a single MPI task with support for four threads, two MPI tasks with support for two threads each or four MPI tasks. Running on 512 nodes in virtual-node mode should give you 2048 MPI tasks. Partition size require a minimum size 32 nodes. Despite this we try to ensure that allocations are of larger binary power sizes in order to avoid fragmenting the system and to provide the maximum per job capability as intended.

The Blue Gene/P has a single front end login node with 16 1.8GHz Power5+ cores and 64GB RAM is provided for development along with some pre- and post-processing of data.

Storage is provided by 33TB (formatted) of tightly-integrated SAN running the IBM GPFS filesystem. In ideal cases the storage should be able to provide 1GB/sec of I/O from the Blue Gene cabinet. This should make the use of large checkpoint files both feasible and relatively efficient.

3. Getting Help

The National Capability (Blue Gene) Service uses the same online Helpdesk as the other standard ICHEC services. Any queries logged there will be assigned to a staff member who will assist you.

4. Backup Policy

IMPORTANT: Please note that the National Capability (Blue Gene) Service does not provide any backup facility for user data. As such it is important that users ensure that they backup important scientific data themselves.

5. Development Environment

5.1 Login and Modules

Users can connect to the login node as follows:

ssh bgp.ichec.ie

When a user logs in to the login node they normally need to enable the development environment. This can be done be loading a module file as follows:

module load bgp

Environment modules can also be listed(available or loaded), shown, unloaded, swapped:

module avail
module list
module unload bgp

5.2 Compilers

The Blue Gene system uses the IBM XL compiler suite (XL C/C++ version 9.0 and XL Fortran version 11.1). Our default version is made available as part of the environment module (above).

When compiling MPI programs for the Blue Gene the IBM XL compilers should be used. Specific MPI wrappers are provided. Some examples:

mpixlc mpixlcxx mpixlf90

Again, the enviroment module (above) ensures that you are using the correct MPI wrapper.

6. Batch Processing

6.1 LoadLeveler

The Blue Gene system uses the IBM LoadLeveler batch processing system.

The current configuration of the queues (llclass) is such that the Blue Gene/P is providing long runtime production queues for large node counts. This policy is subject to change and will be driven by project requirements throughout the life of the service.

6.2 LoadLeveler Command Summary

The most important commands are as follows:

llstatus Show the LoadLeveler status
llclass List the available classes (queues)
llq lIST running and queued jobs
llsubmit jobscript Submit a jobscript to LoadLeveler
llcancel Cancel a running or queued job

6.3 Sample LoadLeveler Script

Below is a sample LoadLeveler submit script:

#@ job_type = bluegene
#
# Specify number of nodes required, make sure it tallies with mpirun arguments and class submitted to
#@ bg_size = 512
#
# Specify which type of BG to run on, (most ll commands have a -X argument for cluster usage)
#@ cluster_list = BG/P
#
#@ input = /dev/null
#@ output = $(jobid).out
#@ error = $(jobid).err
#@ wall_clock_limit=00:30:00
#
# Use the llclass command to see what queues are available to submit to
#@ class = 512_48hrs
#
# Change the value for account_no to your project code
#@ account_no = MY_ACCOUNT_ID
#@ queue

# See IBM Blue Gene documentation for mpirun arguments or use mpirun -h
/bgsys/drivers/ppcfloor/bin/mpirun -np 1024 -mode DUAL -env "OMP_NUM_THREADS=2" -mapfile TXYZ -cwd $PWD -exe $PWD/a.out -args my_args

An appropriate class can be chosen from the list displayed by llclass.

7. Quotas

We provide three types of storage to projects the Blue Gene system. They all share the same filesystem but are differentiated by policy. Each user gets their own home directory with a small quota. Each project gets a large project work directory at /ichec/work/projectname shared among members. Both of these are persistent over the life of the project. Projects also have access to a scratch space at /ichec/scratch/projectname. This scratch space is non-persistent and should be used for short term storage of data such as large checkpoint-restart files.

To check quotas on any of these use the quota command.

8. Stack Overflow

One major issue that new users on Blue Gene systems will encounter is stack overflow. The Blue Gene/P system does have some stack protection in the form of a 4KB fence page but it is possible to stride over this in code. It's always worth keeping this in mind when developing code and running new problems.

9. Core Dumps

By default the Blue Gene/P generates core dump from all processes. This behaviour can be altered so that only one core dump is generated for the first process by adding an environment setting to the @arguments section of the submit script.

-env BG_APP_COREDUMPDISABLED=1

This should save you from a lot of file cleanups if you don't need the core dumps for debugging purposes. The coreprocessor utility can be helpful when examing core dumps, it can be found in the following location: /bgsys/drivers/ppcfloor/tools/coreprocessor/coreprocessor.pl.

10. Signal 7 Issue

Some users may find that some Blue Gene/P jobs exit with a segmentation fault and signal 7 (SIGBUS). This is due to an undocumented feature on the Blue Gene/P that deals with memory alignment during I/O operations. This feature allows the user to set a memory alignment error threshold for I/O operations in their job but results in confusion due to it's default value of 1000. This threshold can be controlled using an environment variable.

Die as soon as a memory alignment error is encountered during I/O:

-env BG_MAXALIGNEXP=0

Silence all memory alignment errors during I/O:

-env BG_MAXALIGNEXP=-1

Configure a threshold to workaround occasional memory alignment errors during I/O (eg. 2000 in this case):

-env BG_MAXALIGNEXP=2000

Note that memory alignment errors during I/O do have the potential to reduce I/O performance and as such it is better to eliminate them rather than ignore them if possible.