Parallelism and Memory Bandwidth in a Stencil Code (C++/Windows)

Online Course

Memory bandwidth is the rate at which data can be delivered to processor cores. To tap the most bandwidth in Intel® processor-based systems, applications must use threading, be vectorized, access data contiguously, control thread affinity, be NUMA-friendly, reuse data in caches, and issue streaming stores when appropriate. This course demonstrates the usage of Intel® Parallel Studio XE tools to optimize memory bandwidth in applications written in the C++ language (most techniques are also applicable to C).

In this course, you will learn:

  1. How to measure the bandwidth of your system using the STREAM benchmark
  2. What programming techniques and Intel® C++ compiler arguments to use to maximize bandwidth
  3. Usage of Intel® VTune™ Amplifier Application Performance Snapshot for guidance with bandwidth tuning in your application
  4. Parallelism modeling in Intel® Advisor
  5. How to get vectorization advice in Intel® Advisor
  6. Roofline analysis in Intel® Advisor
  7. Memory access analysis in Intel® VTune™ Amplifier

The course is structured as a combination of instruction material (30%) and hands-on guided exercise (70%). After enrolling in the course, you can download the source code of a hands-on exercise, which is a bandwidth-bound 9-point stencil kernel. You can compile the application on your computer and proceed with the lessons. You will run and analyze the application in Intel® Parallel Studio XE tools and improve bandwidth utilization in it. You can also download and open the results of data collection in Intel® Advisor and Intel® VTune™ Amplifier collected on a server based on a two-way Intel® Xeon® Scalable processor with a total of 12 CPU cores.

Slides:
PSXE-2018-Parallelism-and-Memory-Bandwidth-in-a-Stencil-Code-C-Win.pptx
PSXE-2018-Parallelism-and-Memory-Bandwidth-in-a-Stencil-Code-C-Win.pdf

Course Information

More courses...