MPI C Programming for NVidia Jetson TX
Duration: 5 Days
Intended Audience
This course is for experienced C/C++ programmers who also have some familiarity with CUDA who need not only to get up to speed with MPI programming, but also to explore its practical use in networks containing multiple, networked, NVidia Jetson TX2 devices.
Course Overview
Parallel programming by definition involves co-operation between processes to solve a common task. It is up to the programmer to define the tasks that will be executed by the processors, and how these tasks are to synchronise and exchange data with one another. In the message-passing model the tasks are separate processes that communicate and synchronise by explicitly sending each other messages. All these parallel operations are performed via calls to some message-passing interface that is entirely responsible for interfacing with the physical communication network linking the actual processors together. The Message Passing Interface (MPI) is the de-facto standard for message passing. This course covers the key aspects of MPI programming such as point-to-point communication, non-blocking operations, derived datatypes, virtual topologies, collective communication. It also covers general parallel programming code design issues. The course is taught using a class network of NVidia Jetson TX2 processors and PC computers running Linux. It also covers applications that combine MPI and CUDA.
Course Contents
- Distributed memory and shared memory computing models
- Message-Passing Concepts
- Features of message passing programs
- Point-to-Point Communications and Messages
- Communication Modes and Completion Criteria
- Blocking and Nonblocking Communication
- Collective Communications
- Broadcast Operations
- Scatter and Gather Operations
- Reduction Operations
- MPI Routines and Return Values
- MPI Handles
- MPI Datatypes
- Communicators
- Tags
- Modes
- Sending and Receiving
- Blocking and Completion
- Deadlock and Deadlock Avoidance
- Nonblocking Sends and Receives
- Posting, Completion, and Request Handles
- Posting Sends and Receives without Blocking
- Completion - Waiting and Testing
- Send Modes
- Standard Mode Send
- Synchronous Mode Send
- Ready Mode Send
- Buffered Mode Send
- Buffer filling and MPI_Pack
- MPI_Struct and Mapping of C Structs to MPI Derived Types
- MPI_Type_contiguous
- MPI_Type_vector
- MPI_Type_hvector
- MPI_Type_indexed
- MPI_Type_hindexed
- Controlling the Extent of a Derived Type
- MPI_Barrier - Barrier Synchronisation
- MPI_Bcast- Broadcast
- MPI_Reduce - Reduction
- MPI_Gather - Gathering
- MPI_Allgather
- MPI_Scatter - Scattering
- MPI_Allreduce
- MPI_Gatherv
- MPI_Scatterv
- MPI_Scan
- MPI_Reduce_scatter
- MPI_Comm_world
- MPI_Comm_group
- MPI_Group_incl
- MPI_Group_excl
- MPI_Group_rank
- MPI_Group_free
- MPI_Comm_create
- MPI_Comm_split
- MPI_Cart_create
- MPI_Cart_coords
- MPI_Cart_rank
- MPI_Cart_shift
- MPI_Cart_sub
- MPI_Cartdim_get
- MPI_Cart_get
- MPI_Cart_shift
- Matrix Transposition
- Iterative Solvers
- Characteristics of Serial I/O
- Characteristics of Parallel I/O
- Introduction to MPI-2 Parallel I/O
- MPI-2 File Structure
- Initializing MPI-2 File I/O
- View
- Data Access - Reading Data
- Data Access - Writing Data
- Closing MPI-2 File I/O
- PBLAS - Parallel Basic Linear Algebra Subproblems
- ScaLAPACK - Scalable Linear Algebra PACkage
- Domain decomposition
- Functional decomposition
- Load balancing
- Minimising Communication
- Designing for Performance