Transferring multiple TB-sized files between my local cluster and AWS S3 storage

jpessin1 · March 16, 2018, 1:22am

My group regularly has to transfer multiple large (several TB) files to AWS for processing, we would like to use our institute’s Globus service to move the files more efficiently. How can I use Globus to transfer multiple TB-sized files between my local cluster and AWS S3 storage in a short amount of time?

CURATOR: jpessin1

jpessin1 · June 22, 2018, 10:49pm

Globus offers a special tool to connect to AWS’s s3 storage, unsurprisingly, it’s called “Amazon Web Services S3 Connector.”

You install this in an ec2, then you point it to a bucket, and configure the credentials, and you have a Globus endpoint.

The Globus instructions are here:

https://docs.globus.org/premium-storage-connectors/aws-s3/

and Northwestern has a nice walk through (and other considerations) that is generally useful:
https://kb.northwestern.edu/using-globus-with-s3

One other thing to keep in mind, since the end point is an ec2 instance - it will have its own I/O limitations - select an instance that will have the throughput capacity you need. If you are dealing with both very large volume and multiple files, consider more instances instead of a bigger instance.
(Myself, I’d tend toward the M5’s, 4xL or smaller, YMMV)