Running multiple LAMMPS jobs on a node

schadalapaka · September 26, 2018, 5:02pm

One of the researchers is trying to run multiple LAMMPS jobs on a node on our cluster where a job uses 1 core.

mpirun -np 1 lmp_mpi < Project.txt7 > output_7_1.txt

However, as the user submits multiple jobs that are similar:

…

mpirun -np 1 lmp_mpi < Project.txt7 > output_7_1.txt

mpirun -np 1 lmp_mpi < Project.txt7 > output_8_1.txt

…

and number of LAMMPS jobs running on a node increases, instances are using less than 100% CPU usage/job which is ultimately causing all the simulations to slow down.

Here’s the CPU usage of each of the simulations on one of the compute nodes using top command:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22037 username 20 0 598632 101664 1196 R 33.2 0.1 8082:11 lmp_mpi

22332 username 20 0 597292 100796 1196 R 33.2 0.1 8076:26 lmp_mpi

22345 username 20 0 596560 101572 1196 R 33.2 0.1 8084:15 lmp_mpi

I am have tried multiple options with MPI and Slurm like --exclusive option and trying to set --cpus-per-task --ntasks-per-node parameters but still see the same results.

Is this error caused because of how much I/O processing LAMMPS takes? If so, can we reduce the verbosity of LAMMPS ?

How can we get past this error?

System information:

CentOS 7

Slurm Scheduler

LAMMPS version - lammps-16Feb16

MPI Version - openmpi-1.8/gcc

Each of our compute nodes has either 20-cores / 24-cores and each core can run 1 process.

Here’s the complete job script of one of the simulations:

#######################

#!/usr/bin/env bash

#SBATCH --job-name=Sim-8

#SBATCH --partition=debug.q

#SBATCH --mem=1G

#SBATCH --export=ALL

#SBATCH --nodelist=mrcd08

module load lammps16

mpirun -np 1 lmp_mpi < Project.txt7 > output_7_1.txt

#########################

Sarvani Chadalapaka

HPC Administrator

University of California Merced, Office of Information Technology

schadalapaka · September 26, 2018, 5:02pm

It seems like the jobs are not binding to the correct core assigned by SLURM, but overloading cores.

In order for the job submission to run, the script had to be changed from

mpirun -np 1 lmp_mpi < Project.txt7 > output_7_1.txt

to

srun --hint=nomultithread lmp_mpi < Project.txt7 > output_18_1.txt

From Slurm documentation, --hint=nomultithread srun option causes Slurm to allocate only one thread from each core to this job.

Sarvani Chadalapaka

HPC Administrator

University of California Merced, Office of Information Technology