Logo of Science Foundation Ireland  Logo of the Higher Education Authority, Ireland7 Capacities
Ireland's High-Performance Computing Centre | ICHEC
Home | News | Infrastructure | Outreach | Services | Research | Support | Education & Training | Consultancy | About Us | Login

User Mailing

ICHEC mail #9

Posted: 2005-11-11

Dear ICHEC users,


1. Unplanned break in service
2. Quotas on home directories
3. High-performance scratch areas
4. Stability issues on Walton
5. Updated scheduling policies
6. Guidelines for reporting issues to the helpdesk
7. Courses
8. Full National Service
9. Future activities

1 - Unplanned break in service

Service on our systems has been recently interrupted following an unannounced power cut to the hosting site. Backup generators have allowed us to bring the systems down cleanly but all running jobs have unfortunately been killed. Service is should resume by 3.00pm (Friday 11th).

See http://www.ichec.ie/status

2 - Quotas on home directories

Quotas have been introduced on home directories. All users now have a 10GB limit on their home space. Additional space is available to all users under /ichec/work/projectname/ where "projectname" corresponds to the name of your project (for instance, icphy003). If you have forgotten the name of the project you are affiliated to, you can type the UNIX command "groups" which will return a list of the projects to which you belong. The one you are looking for has the structure IIAAANNN where II corresponds to your PI's institution, AAA to your scientific discipline, and NNN is a numeric number (e.g., for the above example, ic for ICHEC, phy for Physics, and 003).

Quotas have also been set on work directories. Their levels correspond to the resources requested at the time of the application. We have added a new entry in our "Frequently Asked Questions" which describes a number of useful commands to find out about your quotas.

3 - High-performance scratch areas

Access to high-performance scratch areas is now provided at runtime. Once a job is started by PBS, a set of per-job temporary directories are now created on each node. You can access these directories from your PBS scripts by using the environment variable TMPDIR=/localscratch/pbstmp.$PBS_JOBID.

It is worth emphasising that:

- these directories are not in a shared filesystem (they are private to each node); and
- they are deleted at the end of a job so it is the script's responsibility to copy any data back to your home or work directories at the end of the PBS script.

4 - Stability issues on Walton

Walton has recently suffered a number of instability issues which resulted in user jobs crashing for no apparent reasons. This instability was caused by the following problems:

- inconsistencies in the GPFS filesystem across certain nodes (filesystem not mounted properly on a few nodes);
- kernel boot parameters (power saving features enabled) which caused unexplained kernel panics.

The first problem has now been rectified, and fixes are progressively being applied through the cluster to resolve the kernel panics. We expect these fixes to be rolled out through the whole cluster by Friday afternoon.

5 - Updated scheduling policies

The production region on Hamilton has been increased from 20 to 24 CPUs, leaving only 8 CPUs for development. This decision will be reviewed after our first OpenMP course.

6 - Guidelines for reporting issues to the helpdesk

Users reporting unexplained crashes through the helpdesk should remember to include the following information:

- Job ID (as allocated by PBS - see the output/error files);
- modules loaded for building the executables (e.g., compilers and libraries - e.g., for MPI, clearly indicate which module you had loaded);
- include a copy of the PBS script you have used to submit your job.

7 - Courses

ICHEC will be running its three day HPC course ("Introduction to HPC" and "Introduction to MPI") at NUI Galway on 15th to 17th November. Further details as well as the application form can be found on our web site.

Courses will also take place in Cork and Dublin in late November/early December (exact dates to be confirmed).

8 - Full National Service

Guidelines for the Full National Service (to start in early January 2006) are currently being finalised. A formal call for applications will be issued during the week starting 14th November. Further details will be published on our web site very shortly.

9 - Future activities

a/ Note that accounting will be enabled across our infrastructure later this month. We do not expect this to cause any disruptions to the service.

b/ The debugging tool DDT has been tested extensively across our cluster. We are currently working on an introductory user guide on debugging parallel applications with DDT. This document will be published on our web site by the end of next week.

c/ Preparations are also under way for our first ICHEC User Group Meeting (UGM). This meeting will take place in Dublin in early December. We hope that many of you will be able to attend this meeting to give us your feedback.

Return to User Mailings