Setting Up Local Galaxy Gateway Infrastructure: Getting Started, Setup & Ongoing Maintenance?

We learned about Galaxy science gateway (https://galaxyproject.org/) and there is an initial interest in our community to set up a local, production-grade, multi-user Galaxy instance backed by our HPC cluster. I noticed the set-up procedure is involved but not that terrible:

https://docs.galaxyproject.org/en/latest/admin/production.html#using-a-compute-cluster
https://docs.galaxyproject.org/en/latest/admin/cluster.html

Some questions I have for installation:

  • How to set up the environment properly for multi-user environment that allows seamless collaboration (see below too)?
  • Any concerns & advice for security?

But I have a question regarding how much ongoing work this would take in terms of maintenance, upgrade, troubleshooting, securing the infrastructure, etc. I wonder if people with local production-grade galaxy instances can share their experiences and lessons learned w.r.t. owning a galaxy instance.

It is my understanding that this tool also supports collaboration (i.e. working as a group)… how complex is it to set up such collaborative data spaces, workflows, projects where the a production HPC cluster backend is involved.

The details of the cluster: Linux HPC cluster, with SLURM job scheduler. It has shared NFS filesystem backed by ISILON (accessible in many different ways, not just via HPC), as well as a Lustre scratch space.

I know these questions may be initially vague and I may need to supply some details.

Thanks,
Wirawan

Slurm has some specific configuration tasks for Galaxy. If you don’t use Ansible to deploy Galaxy, it would be wise to review the playbook tasks because I think the Galaxy Team have a separate Slurm install you get from them on the Galaxy Server. I ran a Research and Clinical Instance of Galaxy backed by a SGE HPC Cluster. Finding Sysadmin help to keep a Galaxy Instance happy is a challenge. There is database maintenance, disk space maintenance that keeps the system happy. Mostly a lot of the Galaxy work centers around job configuration and tool updates.

I’ve spun up several production Galaxy gateways over the years that have been integrated with various job schedulers (Slurm, PBS Torque, HTCondor) and heterogeneous HPC/HTC resources. The training documentation you linked to is generally kept up-to-date and represents the preferred mechanism (i.e., Ansible) to install and support a production Galaxy instance. Additionally, the Galaxy admins tend to congregate here: - Gitter and are generally pretty responsive to newcomers trying to get into the Galaxy community.

Depending on the scale and complexity of Galaxy instance you’re looking to deploy and maintain, it’ll require anywhere from some fraction of an FTE sysadmin to a full sysadmin dedicated to the feeding and care of the systems. Much of the system administration can be heavily automated, but the more features you want (e.g., Shib authentication, Jupyter integration), the more time and effort you’ll need to expend to configure everything appropriately. However, Galaxy itself generally supports everything you mentioned with regards to collaboration etc…