Check out our video on HPC terms and concepts. It includes visual explanations of: nodes, processors, and cores; tasks, threads, and processes; and shared vs distributed memory.
HPC Terminology and ‘Core’ Concepts - What’s in a Node?
This 5 minute video consists of excerpts from our much longer tutorial on running parallel jobs on Henry2, NCSU’s campus cluster. I have suggested it to students using other clusters, but it was inconvenient to include a list of the relevant time stamps!
The video is a result of loud complaints that there was little documentation on these frequently used terms, and what documentation we had was poor. This was compounded by the complexity of Henry2’s heterogeneous architecture and complex scheduler (LSF) resource syntax. For example, to request an 8 core node, you need to request “qc” for nodes with dual quad core processors. Additionally, by default, nodes are shared and cores distributed across nodes to grab any free cores. Unsuspecting users would request 8 cores for a shared memory job, LSF would schedule 1 task on the first node and 7 on the next. Then the program would autospawn threads according to how many cores were on the first assigned node. One or several other jobs could be doing the same thing. We do not kill jobs that use more resources than requested - nodes were overloaded and thus execution times seemed completely random! The same people who may have been completely new to Linux and the command line needed to actually know and understand all of these concepts to run jobs without violating the Acceptable Use Policy!
In creating the video, I dug through several MPI lectures to figure out ‘the proper way’ to use these terms. I found that every lecturer - without exception - defined the terminology and went on to say “Apologies in advance that I’ll use the wrong terms, but back when I learned MPI…” I do that too, and still use ‘numprocs’ when I write code. I discovered a great slide with “Processing Element” and other terms in an XSEDE MPI Workshop (John Urbanic). The “SHOW MORE” section below the video gives credit for that and also for the suspiciously familiar looking node diagrams.