Logo of Science Foundation Ireland  Logo of the Higher Education Authority, Ireland7 CapacitiesGPGPU Research Projects
Ireland's High-Performance Computing Centre | ICHEC
Home | News | Infrastructure | Outreach | Services | Research | Support | Education & Training | Consultancy | About Us | Login

ICHEC Software

Information about software packages installed on the ICHEC systems.

LAMMPS

Versions Installed

Fionn: 11-Nov-2013, 1-Feb-2014

Stoney: 9-Jul-2009, 24-Aprl-2013

Description

LAMMPS is a classical molecular dynamics code that models an ensemble of particles in a liquid, solid, or gaseous state. It can model atomic, polymeric, biological, metallic, granular, and coarse-grained systems using a variety of force fields and boundary conditions.

In the most general sense, LAMMPS integrates Newton's equations of motion for collections of atoms, molecules, or macroscopic particles that interact via short- or long-range forces with a variety of initial and/or boundary conditions. LAMMPS is most efficient (in a parallel sense) for systems whose particles fill a 3d rectangular box with roughly uniform density.

LAMMPS includes many optional packages, which are groups of files that enable a specific set of features. For example, force fields for molecular systems or granular systems are in packages. The packages currently available in LAMMPS are listed at http://lammps.sandia.gov/doc/Section_packages.html.

GPU-enabled LAMMPS

Both the GPU and USER-CUDA packages accelerate a LAMMPS calculation using the NVIDIA hardware on Fionn and Stoney, but they do it in different ways.

As a consequence, for a particular simulation on Fionn or Stoney, one package may be faster than the other. We give guidelines below, but the best way to determine which package is faster for your input script is to try both of them. See the benchmarking section below for examples where this has been done.

The Standard GPU package

The GPU package was developed by Mike Brown at ORNL. It provides GPU versions of several pair styles and for long-range Coulombics via the PPPM command. It has the following features:

  • The package is designed to exploit common GPU hardware configurations where one or more GPUs are coupled with many cores of a multi-core CPUs, e.g. within a node of a parallel machine.
  • Atom-based data (e.g. coordinates, forces) moves back-and-forth between the CPU(s) and GPU every time-step.
  • Neighbor lists can be constructed on the CPU or on the GPU.
  • The charge assignment and force interpolation portions of PPPM can be run on the GPU. The FFT portion, which requires MPI communication between processors, runs on the CPU.
  • Asynchronous force computations can be performed simultaneously on the CPU(s) and GPU.

The GPU package allows you to assign multiple CPUs (cores) to a single GPU (a common configuration for "hybrid" nodes that contain multicore CPU(s) and GPU(s)) and works effectively in this mode.

The GPU package accelerates only pair force, neighbor list, and PPPM calculations.

The USER-CUDA package

The USER-CUDA package was developed by Christian Trott at U Technology Ilmenau in Germany. It provides NVIDIA GPU versions of many pair styles, many fixes, a few computes, and for long-range Coulombics via the PPPM command. It has the following features:

  • The package is designed to allow an entire LAMMPS calculation, for many time-steps, to run entirely on the GPU (except for inter-processor MPI communication), so that atom-based data (e.g. coordinates, forces) do not have to move back-and-forth between the CPU and GPU.
  • The speed-up advantage of this approach is typically better when the number of atoms per GPU is large.
  • Data will stay on the GPU until a time-step where a non-GPU-ized fix or compute is invoked. Whenever a non-GPU operation occurs (fix, compute, output), data automatically moves back to the CPU as needed. This may incur a performance penalty, but should otherwise work transparently.
  • Neighbor lists for GPU-ized pair styles are constructed on the GPU.
  • The package only supports use of a single CPU (core) with each GPU.

The USER-CUDA package offers more speed-up relative to CPU performance when the number of atoms per GPU is large, e.g. on the order of tens or hundreds of 1000s.

As noted above, this package will continue to run a simulation entirely on the GPU(s) (except for inter-processor MPI communication), for multiple time-steps, until a CPU calculation is required, either by a fix or compute that is non-GPU-ized, or until output is performed (thermo or dump snapshot or restart file). The less often this occurs, the faster your simulation will run.

Differences between the two packages:

  • The GPU package accelerates only pair force, neighbor list, and PPPM calculations. The USER-CUDA package currently supports a wider range of pair styles and can also accelerate many fix styles and some compute styles, as well as neighbor list and PPPM calculations.
  • The USER-CUDA package does not support acceleration for minimization.
  • The USER-CUDA package does not support hybrid pair styles.
  • The USER-CUDA package can order atoms in the neighbor list differently from run to run resulting in a different order for force accumulation.
  • The USER-CUDA package has a limit on the number of atom types that can be used in a simulation.
  • The GPU package requires neighbor lists to be built on the CPU when using exclusion lists or a triclinic simulation box.
  • The GPU package uses more GPU memory than the USER-CUDA package. This is generally not a problem since typical runs are computation-limited rather than memory-limited.

License

LAMMPS is a freely-available open-source code, distributed under the terms of the GNU Public License, which means you can use or modify the code however you wish.

Benchmarks

The performance of LAMMPS running on Fionn and our old system, Stokes is shown in the figure below. The calculation is for a Lennard-Jones Liquid Potential with 2097152 atoms. The ICEX results refer to the main partition on Fionn. For the GPU-enabled calculations on Fionn, two K20X GPUs were used per node and all calculations on the GPUs were carried out in single precision. The GPU results refer to the standard GPU package on Fionn and the CUDA results refer the USER-CUDA package on Fionn.

Additional Notes

To use LAMMPS on Fionn, load the molecular modelling environment module:

module load molmodel

A specific version of LAMMPS can then be loaded:

module load lammps/intel/11Nov13

LAMMPS simulations should be submitted using a PBS script. For information on how to write a PBS script, please refer to the batch processing tutorial on our tutorials page or the documentation section. For 24 MPI processes, the LAMMPS executable may be run as follows (after loading the appropriate module):

mpirun -np 24 lmp < myInputFile > myOutputFile

To run LAMMPS with GPU package support on Fionn follow this example (8 Fionn nodes with 2 GPUs per node) :

A simple PBS script:

#!/bin/bash
#PBS -N LAMMPS_Test
#PBS -j oe
#PBS -r n
#PBS -A sci_test
#PBS -l nodes=8:ppn=20
#PBS -l walltime=10:00:00
#PBS -o output-gpu
#PBS -q GpuQ

module purge
module load molmodel lammps/intel/11Nov13

cd $PBS_O_WORKDIR

mpiexec lmp_gpu -sf gpu -c off -v g 2 -v x 64 -v y 64 -v z 128 -v t 100 < in.lj.gpu > lj_out_2gpus_8nd

To run LAMMPS with GPU package support on Stoney follow this example (8 Stoney nodes with 2 GPUs per node) :

A simple PBS script:

#!/bin/bash
#PBS -N LAMMPS_Test
#PBS -A project_name
#PBS -q GpuQ
#PBS -r n
#PBS -l nodes=8:ppn=8,walltime=24:00:00

cd $PBS_O_WORKDIR

module load lammps/24Apr13

mpirun -np 64 -npernode 8 lmp -sf gpu -c off -v g 2 -v x 64 -v y 64 -v z 128 -v t 100 < in.lj.gpu > lj_out_2gpus_8nd.out

The in.lj.gpu input file used in the example above is set up as follows:

A simple GPU input file:

# 3d Lennard-Jones melt

# newton off is required for GPU package
# set variable g = 1/2 for 1/2 GPUs

newton off
if "$g == 1" then "package gpu force/neigh 0 0 1"
if "$g == 2" then "package gpu force/neigh 0 1 1"

units lj atom_style atomic
lattice fcc 0.8442
region box block 0 $x 0 $y 0 $z
create_box 1 box
create_atoms 1 box
mass 1 1.0

velocity all create 1.44 87287 loop geom

pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin
neigh_modify delay 0 every 20 check no

fix 1 all nve
run $t

To run LAMMPS with USER-CUDA package support on Fionn follow this example (8 Fionn nodes with 2 GPUs per node) :

A simple PBS script:

#!/bin/bash
#PBS -N lammps_lj
#PBS -j oe
#PBS -r n
#PBS -A sci_test
#PBS -l nodes=8:ppn=20
#PBS -l walltime=00:30:00
#PBS -o output-cuda
#PBS -q GpuQ

module purge
module load molmodel lammps/intel/11Nov13

cd $PBS_O_WORKDIR

export OMP_NUM_THREADS=1
export I_MPI_FABRICS=shm:ofa

mpiexec -ppn 2 lmp_gpu -sf cuda -v g 2 -v x 64 -v y 64 -v z 128 -v t 100 < in.lj.cuda > lj_out_2cuda_8nd

To run LAMMPS with USER-CUDA package support on Stoney follow this example (8 stoney nodes with 2 GPUs per node) :

A simple PBS script:

#!/bin/bash
#PBS -N LAMMPS_Test
#PBS -A project_name
#PBS -q GpuQ
#PBS -r n
#PBS -l nodes=8:ppn=8,walltime=24:00:00

cd $PBS_O_WORKDIR

module load lammps/24Apr13

mpirun -np 16 -npernode 2 lmp -sf cuda -v g 2 -v x 64 -v y 64 -v z 128 -v t 100 < in.lj.cuda > lj_out_2cuda_8nd.out

The in.lj.cuda input file used in the example above is set up as follows:

A simple GPU input file:

# 3d Lennard-Jones melt

# set variable g = 1/2 for 1/2 GPUs
if "$g == 1" then "package cuda gpu/node 1"
if "$g == 2" then "package cuda gpu/node 2"

units lj atom_style atomic
lattice fcc 0.8442
region box block 0 $x 0 $y 0 $z
create_box 1 box
create_atoms 1 box
mass 1 1.0

velocity all create 1.44 87287 loop geom

pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin
neigh_modify delay 0 every 20 check no

fix 1 all nve
run $t

To run LAMMPS on the Fionn CPU nodes follow this example (8 Fionn nodes) :

A simple PBS script:

#!/bin/bash
#PBS -N LAMMPS_Test
#PBS -j oe
#PBS -r n
#PBS -A project_name
#PBS -l nodes=8:ppn=24
#PBS -l walltime=10:00:00
#PBS -o output-cpu

module purge
module load molmodel lammps/intel/11Nov13

cd $PBS_O_WORKDIR

export OMP_NUM_THREADS=1

mpiexec lmp -v x 64 -v y 64 -v z 128 -v t 100 < in.lj.cpu > lj_out_8nd.out

The in.lj.cpu input file used in the example above is set up as follows:

A simple input file:

# 3d Lennard-Jones melt

units lj atom_style atomic
lattice fcc 0.8442
region box block 0 $x 0 $y 0 $z
create_box 1 box
create_atoms 1 box
mass 1 1.0

velocity all create 1.44 87287 loop geom

pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin
neigh_modify delay 0 every 20 check no

fix 1 all nve
run $t

Further information can be obtained at http://lammps.sandia.gov/.

Return to the software index