We have some projects that include running calculations with high I/O on our university HPC cluster. We are considering moving them to cloud, probably AWS. I believe that the data to be used will reside in a data center in the cloud, suggesting that latency and bandwidth could possibly be affected by geological distances. Not to mention that properties and capacities on both the client side and the cloud storage end will most likely affect performance as observed by the user running the job.
What are the effects of the level of parallelization of the storage itself? Most likely this depends on the size of the chunks of data (objects) being requested (as well as the frequency). The client configuration; processing speed, memory, its own storage properties; must also influence I/O performance in the cloud. Does anyone have some recent numbers (and impressions) they could share?
Thank you!