HPC System Administrator
HPC System Administrator
Applications are invited from suitably qualified candidates for a full-time fixed term position as a HPC System Administrator with the Irish Centre for High End Computing (ICHEC) at the University of Galway.
This position is available from January 2023 to work on high-performance computing and data management projects. The candidate will be based at our offices in Dublin or Galway with the option for a hybrid work from home arrangement.
Interested candidates with the qualifications specified below, should contact firstname.lastname@example.org for further details
Selected responsibilities and duties for this post include, and are not limited to:
- Diagnosing and resolving faults on HPC and associated systems
- Interact with users and computational scientists in diagnosing and resolving problems with applications
- Understand the complex software and hardware stack comprising HPC clusters in order to troubleshoot and fix underlying faults or performance issues
- Work with supplier technical support when required to resolve issues
- Operation of HPC and associated systems
- Automation of lifecycle management of user account and projects
- Installation and configuration of software along with security and bugfix updates
- Configuration of batch scheduling and accounting systems
- Development of comprehensive system monitoring and alerting
- Commissioning of new platforms and services
- Contribute to technical specification and tender evaluation of new HPC and other infrastructure platforms
- Plan and implement the various stages of new platforms and services from commissioning and testing through to migration to production status
- Applicants must have a higher degree (Level 8) in computational science/computer science, or a related discipline, or equivalent experience (min. 3+ years) in a similar technical environment.
- Advanced Linux systems administration skills with at least 3 years practical experience.
- Good knowledge and experience in managing fault tolerant, clustered services and cluster management software.
- Good knowledge of local area networking including Layer 2 switch and VLAN configuration and network services such as DNS and Apache and Nginx web servers.
- Good knowledge of security principles and practices including deploying firewalls, configuring SELinux and experience using security monitoring and intrusion detection tools.
- Experience deploying configuration management (eg Ansible, Saltstack) and monitoring tools (Nagios, Icinga).
- Systematic approach to trouble shooting and problem solving.
- Experience managing HPC specific parallel filesystems such as Lustre
- Knowledge and experience managing private cloud technology such as OpenStack
- Knowledge of federated identity and authentication management systems
- Knowledge of software defined storage clusters (eg CEPH)
Administrative Officer, Grade 4. Salary €44,659 to €50,031 per annum pro rata for shorter and/or part-time contracts (public sector pay policy rules pertaining to new entrants will apply).