Memory bandwidth is the rate at which data can be delivered to processor cores. To tap the most bandwidth in Intel® processor-based systems, applications must use threading, be vectorized, access data contiguously, control thread affinity, be NUMA-friendly, reuse data in caches, and issue streaming stores when appropriate. This course demonstrates the usage of Intel® Parallel Studio XE tools to optimize memory bandwidth in applications written in the C++ language (most techniques are also applicable to C).
In this course, you will learn:
- How to measure the bandwidth of your system using the STREAM benchmark
- What programming techniques and Intel® C++ compiler arguments to use to maximize bandwidth
- Usage of Intel® VTune™ Amplifier Application Performance Snapshot for guidance with bandwidth tuning in your application
- Parallelism modeling in Intel® Advisor
- How to get vectorization advice in Intel® Advisor
- Roofline analysis in Intel® Advisor
- Memory access analysis in Intel® VTune™ Amplifier
The course is structured as a combination of instruction material (30%) and hands-on guided exercise (70%). After enrolling in the course, you can download the source code of a hands-on exercise, which is a bandwidth-bound 9-point stencil kernel. You can compile the application on your computer and proceed with the lessons. You will run and analyze the application in Intel® Parallel Studio XE tools and improve bandwidth utilization in it. You can also download and open the results of data collection in Intel® Advisor and Intel® VTune™ Amplifier collected on a server based on a two-way Intel® Xeon® Scalable processor with a total of 12 CPU cores.