What is the preferred way to install R packages/libraries on Monsoon?

Prerequisites

Understanding external dependencies

Most R package/library installation issues are due to the library requiring software that is unavailable via CRAN (R’s internal package repository). Every R package should have documentation available from the developers stating what external dependencies it may have. Many, but not all, dependencies can be satisfied using our module system. These external dependencies are usually what we need to know to help get difficult R packages installed.

A good place to start checking is via CRAN’s website which often links to the developers original documentation PDFs or webpages. If your desired package has such external dependencies, please submit a request via our software-request form so we can look into providing that software.

As an example, rgdal’s entry on CRAN states in the description what the external dependencies are (“The GDAL and PROJ libraries are external to the package, and must be correctly installed first”)… but other packages may require deeper inspection of the developer’s documentation.

Also, please keep in mind that (as rgdal’s description hints) developers often don’t consider HPC environments where standard installations are frequently not possible – but in most cases, these pre-requisates can be fulfilled using our module system, and/or a personal Anaconda environment.

Login node vs compute node

Many R packages require source code to be compiled as part of their installation process. Some advanced users may prefer doing compilation tasks directly on the compute-nodes, as opposed to the login nodes like wind, rain, and ondemand. However: for technical reasons, the compute-nodes don’t carry the same complement of development-supporting software packages that the login nodes do.

For this reason, we recommend that R packages only be installed via interactive command-line on the login nodes.

The OnDemand web-gateway and R/RStudio

It is specifically worth noting that the ondemand.hpc.nau.edu-based RStudio “app” actually runs on the compute nodes, and as such, should not be used as an installation/compilation environment.

Instead, please use an interactive linux shell (as shown below) to do the initial R package installations. As long as you perform the installation within the same version of R version you choose when launching OnDemand’s RStudio “app”, the libraries should then be available for immediate use.

Special cases

We have created a few package-specific sets of instructions for some of the most popular R packages:

  • rgdal
  • another (not yet linked)

General installation instructions

  1. Load R and any other pre-requisates using the module system. For example:
    $ module load R/4.0.2 <module2> <module3>
  2. If necessary, activate your already-existing Anaconda environment to provide any additional dependencies:
    $ module load anaconda3
    conda activate myEnv
  3. Open an interactive R shell:
    $ R
  4. Execute the required (CRAN) installation:
    > install.packages('pkgName')
  5. When prompted, type the number for any US-based mirror listed
  6. Be aware that packages you install are specific to the version of R you are using, as indicated by the placement of the installations in your ~/R/<version>/ directory.

Using R packages with external dependencies

If your desired package requires external software during normal use (as opposed to simply during installation) then you’ll need to remember to make that software available to your environment before using it, or you will encounter errors.

When using an interactive shell or Slurm job script, just remember to first module load the required software and/or to activate any required Anaconda environment.

When using the RStudio “app” from our OnDemand gateway, you can use the same commands through R’s system() command.