Environments or Containers?

alkurdi · October 2, 2022, 4:40pm

I have a beginner question if I may ask: if wanting to create a “environment” shared between multiple users, are apptainer containers my go to solution? no such thing as a shared conda env?

For my usecase, I don’t want users to modify the environment. I am told I can’t share a conda env between multiple users. Hence, I should create a contianer. Is that true? If so, do I go to my local machine with the good working environment and create a container? Or do I go to the hpc node and setup the env first then “containerize” it?

I would appreciate a crash course video or article if possible. Thanks all and your help is always appreciated.

p.s. this is in the context of moving a working DL code to HPC node.

georgia · October 2, 2022, 11:36pm

It is possible to share a conda env between multiple users, but it can be tricky. You can set a non-default path for a conda environment installation with conda create -p <path> instead of conda create -n <name>. Then to load, the user just executes conda activate <full path>.

We’ve helped users set up group-wide conda environments, and even have some system-wide environments for various things (like JupyterLab).

alkurdi · October 3, 2022, 10:17pm

If I conda create in our shared project folder, for example. Will other users find my env when they do conda env list without having “created it” or initialized it? they’re conda will automatically scan all their shared folders and index it?

georgia · October 3, 2022, 10:22pm

@alkurdi Not automatically, but you can set it up that way. You can set a system-level directory that conda will look in for environments, or users can set it in their .condarc

See some documentation here: Using the .condarc conda configuration file — conda 22.9.0.post17+579cdf206 documentation

lllowe · October 3, 2022, 11:55am

You just need to put the environment in a place where users can read. As Georgia mentioned, just put the full path. We heavily rely on Conda:
https://hpc.ncsu.edu/Software/Apps.php?app=Conda

You can make a Conda environment into a module, just define the paths in the module file so that “module load env” links the paths that would be linked by “conda create env”, I do a “diff” of env before and after. I can be more detailed if you like.

The bigger problem is if you want users to write to the environment. I recommend to everyone to use a YAML file and never change it after the initial create, or else you’ll break everything. Just make a new one.

wwarr · October 3, 2022, 3:06pm

Both answers so far are great. I’ll add that if you have an environment now and want to share it, you can use some of the advice on managing conda environments here: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#sharing-an-environment.

Whatever way you produce the file, you can place it in a GitHub or GitLab repository to share with collaborators locally and remotely, as well as track changes via git.

alkurdi · October 3, 2022, 10:57pm

Thank you! The sharing envs from yml doesn’t really work in my case since some packages are installed manually from wget or git clone like keras-contrib package.

wwarr · October 4, 2022, 3:33pm

In that case you may have a few options depending on how detailed you want to get. Containers are sounding better here if you’ve got custom software to install. Pip can pull git repos (even specific commits and branches, see https://pip.pypa.io/en/stable/cli/pip_install/#examples). Pip isn’t meant for pulling release software in a tarball. Pip can of course be used within a conda env.

A container recipe would probably be more robust if you need to install binary releases or similar that aren’t available via conda.

alkurdi · October 3, 2022, 10:14pm

Thank you all for your responses. I am a little surprised no one said, you should just use containers, as most of the articles online. If I do not yet have a env created on the cluster, is it better to recreate it using conda (painful experience for this project specifically)? or easier to use Apptainer?

georgia · October 3, 2022, 10:18pm

@alkurdi You’re going to have to recreate the environment anyway, either on the cluster or in a container. You can export a conda environment to make reinstallation easy (well, easier). See the link that @wwarr posted above for instructions on exporting the yml and then creating a new environment from the file.

alkurdi · October 3, 2022, 10:54pm

do i have to recreate it inside the container even if I already the env already setup on the my computer?

georgia · October 4, 2022, 11:14am

@alkurdi most likely yes. Conda environments do not like to be moved. Containers are built from container recipes (Dockerfile or Apptainer/Singularity def file, for example).

wwarr · October 4, 2022, 3:27pm

Even when creating from a yml file, sometimes the package build numbers can prevent building on other systems. You’ll see something like -package=version=build, but you’ll just need -package=version. And of course some niche packages are only available for some architectures (Win/Mac/Linux)