Logo of Science Foundation Ireland  Logo of the Higher Education Authority, Ireland7 Capacities
Ireland's High-Performance Computing Centre | ICHEC
Home | News | Infrastructure | Outreach | Services | Research | Support | Education & Training | Consultancy | About Us | Login

User Mailing

ICHEC mail #23

Posted: 2006-06-21

Dear ICHEC users,


1. Network outage
2. Changes to our programming environment
3. Termination of the Transitional Service
4. New version of the taskfarm utility
5. Acknowledging ICHEC in your publications
6. Qpeek – monitoring STDOUT/STDERR of running jobs
7. Call for applications
8. No cost extension
9. Get resources "for free"

1 – Network outage

The HEAnet network circuits which connect the ICHEC systems to the outside world will be down for approximately 15 minutes on Wednesday (21/06/2006) to facilitate the movement of some HEAnet equipment. This should occur some time between 6pm and 10pm. Users will be unable to login to Walton or Hamilton and will not be able to access the ICHEC website during this period. The systems will continue to run normally and no user jobs will be affected.

See the ICHEC systems status page for further information.

2 – Changes to our programming environment

ICHEC has recently purchased licenses for the Pathscale EKOPATH Compiler suite for AMD64 (see http://www.pathscale.com), including C, C++, and Fortran 77/90/95 compilers. This environment has now been tested and rolled out to our production cluster.

New versions of the AMD Core Math Library (ACML) and MPICH have been deployed as part of this new environment, which can be loaded using the "module" command:

module load pathscale
module load mpich/path [Note: the MPICH2 module for Pathscale will be installed by 1st July]

The location of the relevant ACML libraries is /opt/packages/path-compat/

We recommend that you choose this new environment instead of the current Portland Group environment. This latter will be left on our cluster for backwards compatibility, but will no longer be actively supported. A number of ICHEC's supported packages have been reported to run substantially faster with the Pathscale compilers. New versions of our supported packages will therefore be built under the Pathscale / MPICH2 environment.

ICHEC has also installed the Intel Trace Analyser and Collector, a powerful tool to analyse and optimise parallel applications on Hamilton. See http://www.intel.com/cd/software/products/asmo-na/eng/cluster/tanalyzer/index.htm

3 – Termination of the Transitional Service

Note: The changes described in this section will not affect CosmoGrid projects. Contact Thibaut Lery at DIAS should you have any queries regarding CosmoGrid access.

The Transitional Service has ended on 31st May as initially planned.

Following a number of requests, we have extended login access to our systems by one week (until Friday 23rd June) to allow users who have not yet gained access under the Full National Service to transfer their files back to their home institutions. This extension applies to both Walton and Hamilton. After this date, their scratch/work directories will be deleted and their login disabled. Home directories and Web accounts will be preserved to facilitate the return of users who intend to gain access through the Full National Service at a later date.

4 – New version of the taskfarm utility

A couple of users had reported deadlock situations with our taskfarm utility. We have therefore improved this utility to circumvent this problem.

The new taskfarm is located at /opt/packages/taskfarm/taskfarm. This new version is MPICH2 based so users will need to specify the communication type for mpiexec. This can be accomplished with a command line option to mpiexec:

mpiexec -comm mpich2-pmi /opt/packages/taskfarm/taskfarm task-file

Alternatively, users can set the MPIEXEC_COMM environmental variable or load the taskfarm module (module load taskfarm). The taskfarm module also adds the taskfarm to the users PATH.

The documentation at http://www.ichec.ie/support/documentation/task_farming will be updated very shortly to reflect these changes.

5 – Acknowledging ICHEC in your publications

We would appreciate a formal acknowledgement of ICHEC by inclusion in any resulting publications of the following sentence or some variation thereof:

"The authors wish to acknowledge the SFI/HEA Irish Centre for High-End Computing (ICHEC) for the provision of computational facilities and support."

See our FAQ.

We would also appreciate if you could let us know when such publications are being accepted for publication, as this constitutes one of the major metrics by which the scientific impact of ICHEC will be assessed.

This list will be kept up to date at http://www.ichec.ie/publications.

6 – Qpeek – monitoring STDOUT/STDERR of running jobs

To examine stdout and/or stderr (what would normally be written to console) of a running job you can use the qpeek utility. In order to use this you must first set up ssh keys so that you can log on to compute nodes without typing a passphrase (users can ssh to compute nodes associated with their own running jobs for the duration of those jobs):

prompt >$ ssh-keygen -t dsa
(When it asks for a passphrase just hit return. Also, accept default file locations.)

prompt>$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

l2cu27 $ qpeek -?
qpeek: Peek into a job's output spool files

Usage: qpeek [options] JOBID

-c Show all of the output file ("cat", default)
-h Show only the beginning of the output file ("head")
-t Show only the end of the output file ("tail")
-f Show only the end of the file and keep listening ("tail –f")
-# Show only # lines of output
-e Show the stderr file of the job
-o Show the stdout file of the job (default)
-? Display this help message

So for example to stream the output of my job with ID 20010:

qpeek -f 20010

7 – Call for applications

We would like to invite all PIs of projects supported under the Transitional Service who have not yet made an application under the full national service to do so.

We are aware that the popularity of the transitional service had the side effect of increasing the overall job turn around time beyond what many users considered to be acceptable.

We would like to reassure such users that the implementation of the fair share policy and accounting mechanisms has resulted on much improved turn around time. The fair share mechanism ensures that users who had a limited access to our systems get their jobs promoted up the queues, thus ensuring a faster turn around time and equal access to all.

See http://www.ichec.ie/application_guidelines.

8 – No cost extension

PIs from Class B and Class C projects with a starting date before 1st June 2006 may request a no cost extension of their project by contacting us through the helpdesk. As the 12 and 4 month limits still apply, this means that all Class C PIs may extend their access until 30th September 2006, and all Class B PIs may do so until 30th May 2007.

No additional resources will be granted as part of this extension.

9 – Get resources "for free"

ICHEC and the CosmoGrid project have agreed on a new policy which will result in some "free" resources being made available to ICHEC Class A/B/C projects. Relevant excerpts of this agreement are as follows:

A proposed scheme for flexible resource allocation

The scheme will address three priorities:

1. to maximise the overall utilisation of ICHEC resources;
2. to create an incentive to use resources at an early stage of the project;
3. to ensure that CosmoGrid projects fully avail to resources owned by their project.

The scheme implements a scheduling policy based on fair share, which will facilitate CosmoGrid's access to their share of the resources.

In this model, unclaimed CosmoGrid resources will be made available to other users "for free", pro-rata to their own usage over the past month. Let us take an example with the utilisation data from May 2006. In May 2006, the number of cycles "owned by CosmoGrid" on walton has been:

31 days * 24 hours * 95.6% (availability) * 40.7% (CosmoGrid share) * 932 (average #CPUs up) = 269,800 CPU hours.

The total amount of resources used by CosmoGrid projects over this period amount to 41,087 CPU hours, thus leaving a total of 228,713 unused CPU hours.

The proposal is therefore to reimburse these 228,713 CPU hours to our non-CosmoGrid users, pro-rata to their usage for the month. All projects who have consumed resources over the past month would therefore benefit. Restricting our example to the Top 10 projects (May 2006 data), this scheme would have the following effect:

| Project   | PI               | Res. Used | % use | Discount | Res. charged
| | | (CPU h) | (SFI) | (CPU h) | (CPU h)
| tcphy001b | Stefano Sanvito | 208,949 | 35.2 | 80,520 | 128,429
| tiche001b | Simon Elliott | 91,909 | 15.5 | 35,418 | 56,492
| tiche002 | Jim Greer | 82,616 | 13.9 | 31,837 | 50,780
| ndche001 | Maxim Fedorov | 38,339 | 6.5 | 14,774 | 23,565
| tiche001 | Damien Thompson | 37,329 | 6.3 | 14,385 | 22,944
| tiphy001b | Geoffrey Stenuit | 32,605 | 5.5 | 12,564 | 20,040
| tcphy001 | SC Das Pemmaraju | 21,197 | 3.6 | 8,169 | 13,029
| ndeng001 | Scott Rickard | 19,929 | 3.4 | 7,680 | 12,249
| dccom002c | Heather Ruskin | 15,236 | 2.6 | 5,871 | 9,365
| ndlif002c | Richard Edwards | 11,885 | 2.0 | 4,580 | 7,305

Under this scheme, the most active project (tcphy001c) which has used 208,949 CPU hours – or 35.2% of the non-CosmoGrid usage – would qualify for a discount of 228,713 * 35.2%, or 80,520 CPU hours. They would therefore only be charged for 128,429 CPU hours.

This new policy is effective immediately, and will be applied retrospectively to the June 2006 usage.

Return to User Mailings