I have a beginner question if I may ask: if wanting to create a “environment” shared between multiple users, are apptainer containers my go to solution? no such thing as a shared conda env?
For my usecase, I don’t want users to modify the environment. I am told I can’t share a conda env between multiple users. Hence, I should create a contianer. Is that true? If so, do I go to my local machine with the good working environment and create a container? Or do I go to the hpc node and setup the env first then “containerize” it?
I would appreciate a crash course video or article if possible. Thanks all and your help is always appreciated.
p.s. this is in the context of moving a working DL code to HPC node.
It is possible to share a conda env between multiple users, but it can be tricky. You can set a non-default path for a conda environment installation with conda create -p <path> instead of conda create -n <name>. Then to load, the user just executes conda activate <full path>.
We’ve helped users set up group-wide conda environments, and even have some system-wide environments for various things (like JupyterLab).
If I conda create in our shared project folder, for example. Will other users find my env when they do conda env list without having “created it” or initialized it? they’re conda will automatically scan all their shared folders and index it?
@alkurdi Not automatically, but you can set it up that way. You can set a system-level directory that conda will look in for environments, or users can set it in their .condarc
You can make a Conda environment into a module, just define the paths in the module file so that “module load env” links the paths that would be linked by “conda create env”, I do a “diff” of env before and after. I can be more detailed if you like.
The bigger problem is if you want users to write to the environment. I recommend to everyone to use a YAML file and never change it after the initial create, or else you’ll break everything. Just make a new one.
Whatever way you produce the file, you can place it in a GitHub or GitLab repository to share with collaborators locally and remotely, as well as track changes via git.
Thank you! The sharing envs from yml doesn’t really work in my case since some packages are installed manually from wget or git clone like keras-contrib package.
In that case you may have a few options depending on how detailed you want to get. Containers are sounding better here if you’ve got custom software to install. Pip can pull git repos (even specific commits and branches, see https://pip.pypa.io/en/stable/cli/pip_install/#examples). Pip isn’t meant for pulling release software in a tarball. Pip can of course be used within a conda env.
A container recipe would probably be more robust if you need to install binary releases or similar that aren’t available via conda.
Thank you all for your responses. I am a little surprised no one said, you should just use containers, as most of the articles online. If I do not yet have a env created on the cluster, is it better to recreate it using conda (painful experience for this project specifically)? or easier to use Apptainer?
@alkurdi You’re going to have to recreate the environment anyway, either on the cluster or in a container. You can export a conda environment to make reinstallation easy (well, easier). See the link that @wwarr posted above for instructions on exporting the yml and then creating a new environment from the file.
@alkurdi most likely yes. Conda environments do not like to be moved. Containers are built from container recipes (Dockerfile or Apptainer/Singularity def file, for example).
Even when creating from a yml file, sometimes the package build numbers can prevent building on other systems. You’ll see something like -package=version=build, but you’ll just need -package=version. And of course some niche packages are only available for some architectures (Win/Mac/Linux)