Lectures and Readings : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016 (2024)

Parallel Computer Architecture and Programming (CMU 15-418/618)

This page contains lecture slides, videos, and recommended readings for the Spring 2016 offering of 15-418/618.The full listing of lecture videos is available on the Panopto sitehere.

Lecture 1: Why Parallelism

Further Reading:

Lecture 2: A Modern Multi-Core Processor

(forms of parallelism + understanding Latency and BW)

Further Reading:

Lecture 3: Parallel Programming Models

(ways of thinking about parallel programs, and their corresponding hardware implementations)

Lecture 4: Parallel Programming Basics

(the thought process of parallelizing a program)

Lecture 5: GPU Architecture and CUDA Programming

(CUDA programming abstractions, and how they are implemented on modern GPUs)

Further Reading:

Lecture 6: Performance Optimization I: Work Distribution and Scheduling

(good work balance while minimizing the overhead of making the assignment, scheduling Cilk programs with work stealing)

Lecture 7: Performance Optimization II: Locality, Communication, and Contention

(message passing, async vs. blocking sends/receives, pipelining, techniques to increase arithmetic intensity, avoiding contention)

Lecture 8: Parallel Programming Case Studies

(examples of optimizing parallel programs)

Lecture 9: Workload-Driven Performance Evaluation

(hard vs. soft scaling, memory-constrained scaling, scaling problem size, tips for analyzing code performance)

Lecture 10: Snooping-Based Cache Coherence

(definition of memory coherence, invalidation-based coherence using MSI and MESI, maintaining coherence with multi-level caches, false sharing)

Lecture 11: Directory-Based Cache Coherence

(scaling problem of snooping, implementation of directories, directory storage optimization)

Lecture 12: A Basic Snooping-Based Multi-Processor Implementation

(deadlock, livelock, starvation, implementation of coherence on an atomic and split-transaction bus)

Lecture 13: Memory Consistency

(consistency vs. coherence, relaxed consistency models and their motivation, acquire/release semantics)

Lecture 14: Scaling a Web Site

(scale out, load balancing, elasticity, caching)

Further Reading:

Lecture 15: Interconnection Networks

(network properties, topology, basics of flow control)

Lecture 16: Implementing Synchronization

(machine-level atomic operations, implementing locks, implementing, barriers)

Lecture 17: Fine-Grained Synchronization and Lock-Free Programming

(fine-grained snychronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem, hazard pointers)

Further Reading:

Lecture 18: Transactional Memory

(motivation for transactions, design space of transactional memory implementations, lazy-optimistic HTM)

Lecture 19: Heterogeneous Parallelism and Hardware Specialization

(energy-efficient computing, motivation for heterogeneous processing, fixed-function processing, FPGAs, what's in a modern SoC)

Lecture 20: Domain-Specific Programming Systems

(motivation for DSLs, case studies on Lizst and Halide)

Lecture 21: Domain-Specific Programming on Graphs

(GraphLab abstractions, GraphLab implementation, streaming graph processing, graph compression)

Further Reading:

Lecture 22: In-Memory Distributed Computing using Spark

(producer-consumer locality, RDD abstraction, Spark implementation and scheduling)

Lecture 23: Addressing the Memory Wall

(how DRAM works, cache compression, DRAM compression, upgoing memory technologies)

Lecture 24: The Future of High-Performance Computing

(supercomputing vs. distributed computing/analytics, design philosophy of both systems)

Watch the Lecture

Lecture 25: Efficiently Evaluating Deep Networks

(intro to deep networks, what convolution does, mapping convolutin to matrix multiplication, deep network compression)

Lecture 26: Parallel Deep Network Training

(basics of gradient descent and backpropagation, memory footpring issues, asynchronous parallel implementations of gradient descent)

Lecture 27: Parallelizing the 3D Graphics Pipeline

(parallel rasterization, Z/color-buffer compression, tiled rendering, sort-everywhere parallel rendering)

Lecture 28: Course Wrap Up + How to Give a Talk

(tips for giving a clear talk, a bit of philosophy)

Student Final Projects

(the students explore high-performance and high-efficiency topics of their choosing)

Lectures and Readings : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016 (2024)
Top Articles
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated:

Views: 6444

Rating: 4 / 5 (71 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.