What is the best way to customize a modulefile (used to specify a particular version of a software) based on some properties of the node? For example, if the node has a GPU, I would like to set the PATH environment variable to point out to the installation directory of the Tensorflow that corresponds to the binaries that handle GPU computations. If however the node does not have GPUs I would like to set the PATH to another directory that contains CPU version of Tensorflow package.
A module file is just a TCL file, so you can use TCL constructs in it. Here I’ve used an “exec” statement to grep the output of lspci and see it finds an NVidia driver. If it does, I’ll set my path accordingly:
if {[catch {exec /sbin/lspci | grep NVIDIA} results options]} {
set gpu_available false
append-path PATH /path/to/cpu/tensorflow
} else {
set gpu_available true
append-path PATH /path/to/gpu/tensorflow
}
puts stderr "Has GPU? $gpu_available"
1 Like
Thank you, Ben!
This is exactly what I was looking for.
–Katia
We take the option of having two separate directory trees for software. For non-GPU: /opt; and for GPU: /opt_cuda.
So, in each node image, configure Lmod to use the appropriate modulefile directory. Default modules can be set with /usr/share/modulefiles/DefaultModules.lua
The advantage with this approach is that you don’t need the if-then in every modulefile. I.e. the configuration of the module location is in only one place.