Best way to determine if a program is memory-bound or CPU-bound?

What would be a good way to determine the bottlenecks in writing HPC code, for instance if it is memory-bound or CPU-bound? Would tools such as the Intel Studio or open-source profiling tools be the best for this? Are there others that are commonly used that are open-source?

vtune is one way https://software.intel.com/content/www/us/en/develop/tools/vtune-profiler/memory-storage.html

IBM has an article on using oprofile to do this on power. Vtune is easier https://www.ibm.com/developerworks/library/l-evaluatelinuxonpower/index.html

Another Intel tool with a more wizard-like interface is the Roofline Analysis in the Intel Advisor, https://software.intel.com/content/www/us/en/develop/documentation/advisor-user-guide/top/survey-trip-counts-flops-and-roofline-analyses/roofline-analysis.html.

The Roofline model relates the program’s CPU and memory performance, https://en.wikipedia.org/wiki/Roofline_model.

ARM Performance Reports is another good tool to start with. A bit expensive if you don’t have a site license.