nCore logo and header

NCT-300 Programming GPU Processors

This course covers concepts and approaches related to programming GPU processors using both CUDA and OpenCL. Extensive coverage of GPU hardware, memories, data transport and performance optimization enable the student to understand the fundamental aspects of GPU programming. In-depth hands-on laboratories demonstrate how to apply common numerical methods to GPU processors using both the native APIs and open source numerical libraries. This course also covers methods of integrating the Intel TBB threading abstraction layer with GPU software APIs.

 

Length: 4 Days Cost: $3495

 

NCT-300 PDF Brochure

 

Download PDF Brochure | Current Course Schedule | Contact nCore | Arrange Onsite Training

Who Should Attend

Software architects, software developers, software team leaders and managers seeking to develop GPU software. Knowledge of computer architectures, intermediate C++ programming and software development experience are mandatory pre-requisites for this course.

Benefits

  • Teaches everything necessary to start developing high-performance GPU software on Linux or Windows platforms.
  • Coverage of both CUDA and OpenCL including open source computational libraries such as CUBLAS, CUFFT and CUDPP
  • How to integrate multicore software development techniques with GPUs to increase performance.
  • A comprehensive training workshop: This course offers an in-depth overview of fundamental concepts while offering advanced training and practical advice on GPU programming.
  • Online training delivery platform combined with instructor led, hands-on laboratories provide in-depth instruction and increase students knowledge and skills.

Course Objectives

  • Install GPU libraries and drivers, compile CUDA/OpenCL programs on Linux and Windows operating systems.
  • Understand nVidia GPU hardware and the underlying technical concepts including SIMD processing and hardware threading architectures.
  • Understand the different GPU programming APIs and their appropriate use with various applications.
  • Understand single and double precision floating point calculations.
  • Understand the difference between GPU memory types and the advantages and disadvantages of each.
  • Effectively orchestrate the transport of data to and from GPU memory.
  • Correctly implement two common types of numerical algorithms - Matrix Multiplication and Reduction.
  • Coverage of performance optimization including the cudaprof profiling tool, loop unrolling, coalesced memory access, memory bandwidth estimations and occupancy.
  • Introduction to and labs for commonly used open source computational libraries - CUBLAS, CUFFT and CUDPP.
  • How to meld multicore processors and GPUs to take maximum advantage of modern platform performance.
  • How to integrate the Threading Building Blocks threading abstraction layer with GPU code and migrate TBB primitives to the GPU.
  • Discover how to take advantage of multiple GPUs in the same server.
  • Coverage of cudagdb for CUDA debugging including the use of emulation mode with Valgrind.
  • Extensive hands-on laboratories with code examples using both CUDA and OpenCL.