HPC: How do you ask for help?

vsoch · October 4, 2019, 3:37pm

When you’re testing a script, or running jobs on a cluster, things can go wrong. You get an ugly error message, your script exists, and then what? Part of learning to be a researcher or student is knowing how to ask for help. With this in mind, let’s talk about all the different ways that you might ask:

attending office hours to interact with support staff at your institution
emailing support staff directly
submitting a ticket to a help desk
using a command line tool to submit a help request (e.g., helpme)
opening up an issue on a bug tracker (e.g., GitHub)

If you are a student - what are your favorite ways to ask for help? How quickly do you get a response, what is the quality of the response, and how do those two things relate? How could support be better?

On the flip side, if you are a provider of help (a maintainer on a GitHub board or a support staff) what kind of tools or services make your life easier? Is there a format or a tool that works really well, or that doesn’t?

For this question - for support providers and receivers alike - let’s talk about how we can do this better. If you could design the perfect interaction, what would it look like?

jeremymann · October 4, 2019, 7:53pm

As a support provider, the two main issues I have with users is that they never provide enough information related to their problem, and never read our documentation.

Issue #1, they email us directly with something like “Hey, my job didn’t run, can you take a look?” I email back with the common questions, what is your username, what is the location of your work directory, where are your job scripts, etc… 3 to 4 emails later I finally have enough information to diagnose the problem. Central IT has a ticket system but users rarely (if ever) use it and they just email us directly.

Issue #2, most of the problems that arise from their jobs we have documented in our support wiki. I am constantly sending the link to users (new and old) and politely suggesting them to read the wiki.

vsoch · October 4, 2019, 8:07pm

@jeremymann you might be interested in a tool like HelpMe then - it will submit the user environment and a screen recording to a service of your choosing (e.g., UserVoice is our ticketing system https://vsoch.github.io/helpme/helper-uservoice) and there is also support for GitHub and Discourse. If you have some other system with an API I’d be happy to add that as an integration! And we can also customize the helper further to have some other kind of recorder (aside from terminal asciinema recording) or collector (aside from the environment).

shmget · October 4, 2019, 11:06pm

I’m a support provider as well. We have a help desk system, and we usually insist that people put in a ticket or at least put in one afterwards if it is an emergency. I’ve found that our support organization has a similar issue to what Jeremy mentioned. We might get a ticket with very vague information, and then we have to try and figure out. Or we get led down a path because someone thinks it is one thing but actually another. For instance, someone might say the code said the network was unavailable, but their job actually used up RAM and the kernel killed it.

We also find that users of our clusters share information among themselves on how to set up their account to run a code. Most times it is correct, but when it isn’t, bad practice starts to spread.

I worked in engineering before moving to HPC application / Linux support. Because of that I find it easy to talk to our users. Simply going and making “house calls” and following up on “how are things working?” has helped to give a good impression of our group. I think in HPC developing relationships with the users has helped me immensely in providing support.

My favorite question is: “What are you trying to do?” or “What problem are you trying to solve?”. It allows me to get into the deeper aspects of how they are trying to run codes on the system.

vsoch · October 4, 2019, 11:34pm

For both of you, what’s the ratio between users and support? I like the idea of house calls, but I’d imagine that gets harder to do when you have fewer support staff per users on campus.

shmget · October 5, 2019, 2:30pm

I would say it is a pretty low number of support to users. We do our best. Most codes our users run are off the shelf so most support is when either someone is getting an account setup, trying to run a new code, or something breaks.

william.wilson · October 7, 2019, 4:54pm

We have two methods for our people to ask for support, they may email our support email address (which we advertise all over), or they may put in a service ticket directly to our team.

hjmangalam · October 10, 2019, 3:16am

I sympathize with any support crew who play email ping-pong to try to extract the relevant information from a user who’s requesting help. In order to speed up the process, I’ve written 2 things; one is a HOWTO: How to ask a question (which I’ve added to an email reply template which I use to smack the user if the initial request isn’t enough to start work on the problem) and the other is a simple bash script called mayday…

The second is referenced in the first, but it’s worth noting separately. Being pure bash, it should not be hard to port to other clusters. It also uses the very handy termbin to keep emails reasonably short. YMMV whether it works or you, but I’d be interested in providing mods to other users if they have other systems that need to be incorporated (SLURM instead of SGE, etc)

best
harry