Welcome to the ICHEC National Capability Service. This service is provided using one cabinet of IBM Blue Gene/P. A cabinet of Blue Gene/L was also available however this has now been decommissioned.
The Blue Gene/P also has 1024 nodes but in this case each has four fully cache coherent cores and 2GB RAM. These nodes provide three modes of operation: SMP, Dual and Virtual Node. Respectively, these give you a single MPI task with support for four threads, two MPI tasks with support for two threads each or four MPI tasks. Running on 512 nodes in virtual-node mode should give you 2048 MPI tasks. Partition size require a minimum size 32 nodes. Despite this we try to ensure that allocations are of larger binary power sizes in order to avoid fragmenting the system and to provide the maximum per job capability as intended.
The Blue Gene/P has a single front end login node with 16 1.8GHz Power5+ cores and 64GB RAM is provided for development along with some pre- and post-processing of data.
Storage is provided by 33TB (formatted) of tightly-integrated SAN running the IBM GPFS filesystem. In ideal cases the storage should be able to provide 1GB/sec of I/O from the Blue Gene cabinet. This should make the use of large checkpoint files both feasible and relatively efficient.
The National Capability (Blue Gene) Service uses the same online Helpdesk as the other standard ICHEC services. Any queries logged there will be assigned to a staff member who will assist you.
IMPORTANT: Please note that the National Capability (Blue Gene) Service does not provide any backup facility for user data. As such it is important that users ensure that they backup important scientific data themselves.
Users can connect to the login node as follows:
When a user logs in to the login node they normally need to enable the development environment. This can be done be loading a module file as follows:
Environment modules can also be listed(available or loaded), shown, unloaded, swapped:
The Blue Gene system uses the IBM XL compiler suite (XL C/C++ version 9.0 and XL Fortran version 11.1). Our default version is made available as part of the environment module (above).
When compiling MPI programs for the Blue Gene the IBM XL compilers should be used. Specific MPI wrappers are provided. Some examples:
Again, the enviroment module (above) ensures that you are using the correct MPI wrapper.
The Blue Gene system uses the IBM LoadLeveler batch processing system.
The current configuration of the queues (llclass) is such that the Blue Gene/P is providing long runtime production queues for large node counts. This policy is subject to change and will be driven by project requirements throughout the life of the service.
The most important commands are as follows:
| llstatus | Show the LoadLeveler status |
| llclass | List the available classes (queues) |
| llq | lIST running and queued jobs |
| llsubmit jobscript | Submit a jobscript to LoadLeveler |
| llcancel | Cancel a running or queued job |
Below is a sample LoadLeveler submit script:
An appropriate class can be chosen from the list displayed by llclass.
We provide three types of storage to projects the Blue Gene system. They all share the same filesystem but are differentiated by policy. Each user gets their own home directory with a small quota. Each project gets a large project work directory at /ichec/work/projectname shared among members. Both of these are persistent over the life of the project. Projects also have access to a scratch space at /ichec/scratch/projectname. This scratch space is non-persistent and should be used for short term storage of data such as large checkpoint-restart files.
To check quotas on any of these use the quota command.
One major issue that new users on Blue Gene systems will encounter is stack overflow. The Blue Gene/P system does have some stack protection in the form of a 4KB fence page but it is possible to stride over this in code. It's always worth keeping this in mind when developing code and running new problems.
By default the Blue Gene/P generates core dump from all processes. This behaviour can be altered so that only one core dump is generated for the first process by adding an environment setting to the @arguments section of the submit script.
This should save you from a lot of file cleanups if you don't need the core dumps for debugging purposes. The coreprocessor utility can be helpful when examing core dumps, it can be found in the following location: /bgsys/drivers/ppcfloor/tools/coreprocessor/coreprocessor.pl.
Some users may find that some Blue Gene/P jobs exit with a segmentation fault and signal 7 (SIGBUS). This is due to an undocumented feature on the Blue Gene/P that deals with memory alignment during I/O operations. This feature allows the user to set a memory alignment error threshold for I/O operations in their job but results in confusion due to it's default value of 1000. This threshold can be controlled using an environment variable.
Die as soon as a memory alignment error is encountered during I/O:
Silence all memory alignment errors during I/O:
Configure a threshold to workaround occasional memory alignment errors during I/O (eg. 2000 in this case):
Note that memory alignment errors during I/O do have the potential to reduce I/O performance and as such it is better to eliminate them rather than ignore them if possible.