One of the researchers is trying to run multiple LAMMPS jobs on a node on our cluster where a job uses 1 core.
mpirun -np 1 lmp_mpi < Project.txt7 > output_7_1.txt
However, as the user submits multiple jobs that are similar:
…
mpirun -np 1 lmp_mpi < Project.txt7 > output_7_1.txt
mpirun -np 1 lmp_mpi < Project.txt7 > output_8_1.txt
…
and number of LAMMPS jobs running on a node increases, instances are using less than 100% CPU usage/job which is ultimately causing all the simulations to slow down.
Here’s the CPU usage of each of the simulations on one of the compute nodes using top command:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22037 username 20 0 598632 101664 1196 R 33.2 0.1 8082:11 lmp_mpi
22332 username 20 0 597292 100796 1196 R 33.2 0.1 8076:26 lmp_mpi
22345 username 20 0 596560 101572 1196 R 33.2 0.1 8084:15 lmp_mpi
I am have tried multiple options with MPI and Slurm like --exclusive option and trying to set --cpus-per-task --ntasks-per-node parameters but still see the same results.
Is this error caused because of how much I/O processing LAMMPS takes? If so, can we reduce the verbosity of LAMMPS ?
How can we get past this error?
System information:
CentOS 7
Slurm Scheduler
LAMMPS version - lammps-16Feb16
MPI Version - openmpi-1.8/gcc
Each of our compute nodes has either 20-cores / 24-cores and each core can run 1 process.
Here’s the complete job script of one of the simulations:
#######################
#!/usr/bin/env bash
#SBATCH --job-name=Sim-8
#SBATCH --partition=debug.q
#SBATCH --mem=1G
#SBATCH --export=ALL
#SBATCH --nodelist=mrcd08
module load lammps16
mpirun -np 1 lmp_mpi < Project.txt7 > output_7_1.txt
#########################
Sarvani Chadalapaka
HPC Administrator
University of California Merced, Office of Information Technology