How do I use Anaconda to install & manage software on Monsoon?

Running Anaconda on Monsoon

Monsoon offers several versions of Anaconda through our software module system. Since Anaconda is tightly integrated with Python, available versions are specifically divided into forks that provide Python 2.x and Python 3.x – the most recent version of each being the default for that fork.

One can check what is currently available using the module avail command:

$ module avail anaconda2 anaconda3

------- /packages/Modules/3.2.10/supported ----------------------
anaconda2/2019.07          anaconda2/2019.10(default)

------- /packages/Modules/3.2.10/supported ----------------------
anaconda3/2019.07          anaconda3/2020.02
anaconda3/2019.10          anaconda3/2020.07(default)

After selecting the fork (or fork/version) you want to use, you’ll need to load it so that it is available to your session or job, using the module load command. This command has no feedback if successful, but will add Anaconda’s “conda” command and its bundled Python binary to your path:

$ module load anaconda3
$ which conda python   # just to verify
/packages/python/anaconda3/2020.07/bin/conda
/packages/python/anaconda3/2020.07/bin/python

If you only needed Anaconda for its bundled Python, then you’re done! If you want to use Anaconda to install software into your own personal environments, then continue on.

Creating an Anaconda (“conda”) environment

You can create as many Anaconda environments as you’d like; and once created, you can install whatever (available) software you’d like into them. (But note you can only have one environment active at a time.)

Note that once an environment is created, it will be permanently available to you (and your job scripts) until you explicitly delete it.

You can create an environment simply using the conda create -n <custom_environment_name> command.

$ conda create -n my_tools_1
   
## Package Plan ##
  environment location: /scratch/jtb49/conda/envs/my_tools_1

Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate my_tools_1
#
# To deactivate an active environment, use
#
#     $ conda deactivate

As the above feedback indicates, you’ll still need to activate the environment.

You can also list your existing environments using the conda env list command. (This is demonstrated below.)

Activating and deactivating environments

Simply use the conda activate <environment_name> command to activate the environment you want, but keep in mind that in a job script, or a new terminal session, you’ll first have to load Anaconda using the module system (see above).

[wind]$ conda activate my_tools_1

(my_tools_1) [wind]$ conda env list
# conda environments:
#
base                /packages/python/anaconda3/2020.07
my_sandbox          /scratch/jtb49/conda/envs/my_sandbox
my_tools_1       *  /scratch/jtb49/conda/envs/my_tools_1

(my_tools_1) [wind]$ conda deactivate

[wind]$ 

Activating an environment effectively does four things:

  • modifies your session’s command prompt to indicate the active environment
  • sets the environment as the target for conda install software installations
  • any software that was previously installed to the environment is added to your path ($PATH and/or library path) so that you can execute it by name from a script or from the command-line
  • the suite of additional tools that are bundled with Anaconda are also added to your path (e.g. curl, jupyter, openssl, and many other tools and libraries)

The conda deactivate command has no direct feedback, but you’ll notice your command prompt no longer indicates an active environment name.

Installing “common” software

By default, Monsoon makes the official ‘anaconda’ channel available for software searches and installations. This channel offers packages that are curated and effectively somewhat-vetted by the Anaconda organization. These packages are typically some of the most popular/important software projects out there, and are carefully packaged so as to rarely cause installation problems.

To see if a software package available from the default channel, simply use the
conda search <package_name> command, and any available matches will be displayed, along with versions.

To install the package of your choice:

  • conda install <package_name> will install the latest version
  • conda install <pkg1> <pkg2> <pkg3> installs 3 pkgs simultaneously
  • conda install <package_name>=<version> for a specific version
  • installation will also automagically install necessary dependencies!
(my_tools_1) [wind]$ conda search plotly
Loading channels: done
# Name               Version           Build  Channel
   ...
plotly                 4.8.1            py_0  pkgs/main
plotly                 4.8.2            py_0  pkgs/main
plotly                 4.9.0            py_0  pkgs/main

(my_tools_1) [wind]$ conda install plotly    # installs latest version
   ...
## Package Plan ##
  environment location: /scratch/jtb49/conda/envs/del1
  added / updated specs:
    - plotly
   ...
Proceed ([y]/n)?
Downloading and Extracting Packages
ca-certificates-2020 | 125 KB    | ################################ | 100%
plotly-4.9.0         | 3.5 MB    | ################################ | 100%
retrying-1.3.3       | 14 KB     | ################################ | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Installing “uncommon” software

The default official channel “only” has about 2000 packages available, but you can search across all channels, most easily, using the Anaconda website.

Most “popular” software that isn’t available from the official anaconda channel is available through the “conda-forge” channel, so I’ll demonstrate how to search and install from that channel, but the process is the same for all channels. Just keep in mind that other channels may not package their software as carefully as is done through the official channel, and so has a higher risk of potentially problematic installations.

The conda-forge channel, though, is still pretty safe. It is maintained by the Conda Forge organization which works closely with the Anaconda organization and could be thought of like a beta-testing channel. (At the moment, for example, conda-forge is where one would obtain the new 4.0 version of R which isn’t yet officially available through the default channel.)

To access this and other channels, just use the -c <channel_name> argument after a search or install command:

(my_tools_1) [wind]$ conda search -c conda-forge r=4.0
Loading channels: done
# Name               Version           Build  Channel
   ...
r                        3.6        r36_1003  conda-forge
r                        3.6        r36_1004  conda-forge
r                        4.0        r40_1004  conda-forge

(my_tools_1) [wind]$ conda install -c conda-forge r=4.0
   ...

Using Anaconda environments with job scripts

When you submit an sbatch script to Monsoon’s scheduler, you’ll need to treat your script almost like it was an interactive terminal session completely unaware of other sessions running concurrently or previously. (Similar to the software modules system.)

For example, you cannot activate an environment at the command-line, then submit a script that immediately calls a program from that environment. That program wouldn’t be available to your script because the script can’t “know” what happened before it ran. So this would fail

$ cat jobscript.sh
   #!/bin/bash
   #SBATCH chdir=/home/abc123
   Rscript --version  ## The *script* hasn't loaded R!

$ module load anaconda3
$ conda activate my_tools_1  ## Not "seen" by script!
$ sbatch jobscript.sh

…and your output file would contain a line saying “Rscript: command not found”.

But this would succeed because the environment is activated within the script…

$ cat jobscript.sh
   #!/bin/bash
   #SBATCH chdir=/home/abc123
   module load anaconda3
   conda activate my_tools_1  ## NOW R is loaded/usable!
   Rscript --version

$ sbatch jobscript.sh

…and your output file would dutifully report the “beta” 4.0 version of R, installed earlier. (Both ‘R’ and ‘Rscript’ binaries are installed together.)