Multi - Core Architectures and Programming [CSE Department]
An Introduction to Parallel Programming by Peter S Pacheco
Chapter 1 : Why Parallel Computing
1. Why Parallel Computing?
2. Why We Need Ever-Increasing Performance
3. Why We�re Building Parallel Systems
4. Why we Need to Write Parallel Programs
5. How Do We Write Parallel Programs?
6. Concurrent, Parallel, Distributed
Chapter 2 : Parallel Hardware and Parallel Software
1. Parallel Hardware and Parallel Software
2. Some Background: von Neumann architecture, Processes, multitasking, and threads
3. Modifications to the Von Neumann Model
4. Parallel Hardware
5. Parallel Software
6. Input and Output
7. Performance of Parallel Programming
8. Parallel Program Design with example
9. Writing and Running Parallel Programs
10. Assumptions - Parallel Programming
Chapter 3 : Distributed Memory Programming with MPI
1. Distributed-Memory Programming with MPI
2. The Trapezoidal Rule in MPI
3. Dealing with I/O
4. Collective Communication
5. MPI Derived Datatypes
6. Performance Evaluation of MPI Programs
7. A Parallel Sorting Algorithm
Chapter 4 : Shared Memory Programming with Pthreads
1. Shared-Memory Programming with Pthreads
2. Processes, Threads, and Pthreads
3. Pthreads - Hello, World Program
4. Matrix-Vector Multiplication
5. Critical Sections
6. Busy-Waiting
7. Mutexes
8. Producer-Consumer Synchronization and Semaphores
9. Barriers and Condition Variables
10. Read-Write Locks
11. Caches, Cache Coherence, and False Sharing
12. Thread-Safety
13. Shared-Memory Programming with OpenMP
14. The Trapezoidal Rule
15. Scope of Variables
16. The Reduction Clause
17. The parallel For Directive
18. More About Loops in Openmp: Sorting
19. Scheduling Loops
20. Producers and Consumers
21. Caches, Cache Coherence, and False Sharing
22. Thread-Safety
23. Parallel Program Development
24. Two n-Body Solvers
25. Parallelizing the basic solver using OpenMP
26. Parallelizing the reduced solver using OpenMP
27. Evaluating the OpenMP codes
28. Parallelizing the solvers using pthreads
29. Parallelizing the basic solver using MPI
30. Parallelizing the reduced solver using MPI
31. Performance of the MPI solvers
32. Tree Search
33. Recursive depth-first search
34. Nonrecursive depth-first search
35. Data structures for the serial implementations
36. Performance of the serial implementations
37. Parallelizing tree search
38. A static parallelization of tree search using pthreads
39. A dynamic parallelization of tree search using pthreads
40. Evaluating the Pthreads tree-search programs
41. Parallelizing the tree-search programs using OpenMP
42. Performance of the OpenMP implementations
43. Implementation of tree search using MPI and static partitioning
44. Implementation of tree search using MPI and dynamic partitioning
45. Which API?
Multicore Application Programming For Windows Linux and Oracle Solaris by Darryl Gove
Chapter 1 : Hardware and Processes and Threads
1. Hardware, Processes, and Threads
2. Examining the Insides of a Computer
3. The Motivation for Multicore Processors
4. Supporting Multiple Threads on a Single Chip
5. Increasing Instruction Issue Rate with Pipelined Processor Cores
6. Using Caches to Hold Recently Used Data
7. Using Virtual Memory to Store Data
8. Translating from Virtual Addresses to Physical Addresses
9. The Characteristics of Multiprocessor Systems
10. How Latency and Bandwidth Impact Performance
11. The Translation of Source Code to Assembly Language
12. The Performance of 32-Bit versus 64-Bit Code
13. Ensuring the Correct Order of Memory Operations
14. The Differences Between Processes and Threads
Chapter 2 : Coding for Performance
1. Coding for Performance
2. Defining Performance
3. Understanding Algorithmic Complexity
4. Why Algorithmic Complexity Is Important
5. Using Algorithmic Complexity with Care
6. How Structure Impacts Performance
7. Performance and Convenience Trade-Offs in Source Code and Build Structures
8. Using Libraries to Structure Applications
9. The Impact of Data Structures on Performance
10. The Role of the Compiler
11. The Two Types of Compiler Optimization
12. Selecting Appropriate Compiler Options
13. How Cross-File Optimization Can Be Used to Improve Performance
14. Using Profile Feedback
15. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations
16. Identifying Where Time Is Spent Using Profiling
17. Commonly Available Profiling Tools
18. How Not to Optimize
19. Performance by Design
Chapter 3 : Identifying Opportunities for Parallelism
1. Identifying Opportunities for Parallelism
2. Using Multiple Processes to Improve System Productivity
3. Multiple Users Utilizing a Single System
4. Improving Machine Efficiency Through Consolidation
5. Using Containers to Isolate Applications Sharing a Single System
6. Hosting Multiple Operating Systems Using Hypervisors
7. Using Parallelism to Improve the Performance of a Single Task
8. One Approach to Visualizing Parallel Applications
9. How Parallelism Can Change the Choice of Algorithms
10. Amdahl�s Law
11. Determining the Maximum Practical Threads
12. How Synchronization Costs Reduce Scaling
13. Parallelization Patterns
14. Data Parallelism Using SIMD Instructions
15. Parallelization Using Processes or Threads
16. Multiple Independent Tasks
17. Multiple Loosely Coupled Tasks
18. Multiple Copies of the Same Task
19. Single Task Split Over Multiple Threads
20. Using a Pipeline of Tasks to Work on a Single Item
21. Division of Work into a Client and a Server
22. Splitting Responsibility into a Producer and a Consumer
23. Combining Parallelization Strategies
24. How Dependencies Influence the Ability Run Code in Parallel
25. Antidependencies and Output Dependencies
26. Using Speculation to Break Dependencies
27. Critical Paths
28. Identifying Parallelization Opportunities
Chapter 4 : Synchronization and Data Sharing
1. Synchronization and Data Sharing
2. Data Races
3. Using Tools to Detect Data Races
4. Avoiding Data Races
5. Synchronization Primitives
6. Mutexes and Critical Regions
7. Spin Locks
8. Semaphores
9. Readers-Writer Locks
10. Barriers
11. Atomic Operations and Lock-Free Code
12. Deadlocks and Livelocks
13. Communication Between Threads and Processes
14. Storing Thread-Private Data
Chapter 5 : Using POSIX Threads
1. Using POSIX Threads
2. Creating Threads
3. Compiling Multithreaded Code
4. Process Termination
5. Sharing Data Between Threads
6. Variables and Memory
7. Multiprocess Programming
8. Sockets
9. Reentrant Code and Compiler Flags
10. Windows Threading
Chapter 6 : Windows Threading
1. Creating Native Windows Threads
2. Terminating Threads
3. Creating and Resuming Suspended Threads
4. Using Handles to Kernel Resources
5. Methods of Synchronization and Resource Sharing
6. An Example of Requiring Synchronization Between Threads
7. Protecting Access to Code with Critical Sections
8. Protecting Regions of Code with Mutexes
9. Slim Reader/Writer Locks
10. Signaling Event Completion to Other Threads or Processes
11. Wide String Handling in Windows
12. Creating Processes
13. Sharing Memory Between Processes
14. Inheriting Handles in Child Processes
15. Naming Mutexes and Sharing Them Between Processes
16. Communicating with Pipes
17. Communicating Using Sockets
18. Atomic Updates of Variables
19. Allocating Thread-Local Storage
20. Setting Thread Priority
Chapter 7 : Using Automatic Parallelization and OpenMP
1. Using Automatic Parallelization and OpenMP
2. Using Automatic Parallelization to Produce a Parallel Application
3. Identifying and Parallelizing Reductions
4. Automatic Parallelization of Codes Containing Calls
5. Assisting Compiler in Automatically Parallelizing Code
6. Using OpenMP to Produce a Parallel Application
7. Using OpenMP to Parallelize Loops
8. Runtime Behavior of an OpenMP Application
9. Variable Scoping Inside OpenMP Parallel Regions
10. Parallelizing Reductions Using OpenMP
11. Accessing Private Data Outside the Parallel Region
12. Improving Work Distribution Using Scheduling
13. Using Parallel Sections to Perform Independent Work
14. Nested Parallelism
15. Using OpenMP for Dynamically Defined Parallel Tasks
16. Keeping Data Private to Threads
17. Controlling the OpenMP Runtime Environment
18. Waiting for Work to Complete
19. Restricting the Threads That Execute a Region of Code
20. Ensuring That Code in a Parallel Region Is Executed in Order
21. Collapsing Loops to Improve Workload Balance
22. Enforcing Memory Consistency
23. An Example of Parallelization
Chapter 8 : Hand Coded Synchronization and Sharing
1. Hand-Coded Synchronization and Sharing
2. Atomic Operations
3. Using Compare and Swap Instructions to Form More Complex Atomic Operations
4. Enforcing Memory Ordering to Ensure Correct Operation
5. Compiler Support of Memory-Ordering Directives
6. Reordering of Operations by the Compiler
7. Volatile Variables
8. Operating System�Provided Atomics
9. Lockless Algorithms
10. Dekker�s Algorithm
11. Producer-Consumer with a Circular Buffer
12. Scaling to Multiple Consumers or Producers
13. Scaling the Producer-Consumer to Multiple Threads
14. Modifying the Producer-Consumer Code to Use Atomics
15. The ABA Problem
Chapter 9 : Scaling with Multicore Processors
1. Scaling with Multicore Processors
2. Constraints to Application Scaling
3. Hardware Constraints to Scaling
4. Bandwidth Sharing Between Cores
5. False Sharing
6. Cache Conflict and Capacity
7. Pipeline Resource Starvation
8. Operating System Constraints to Scaling
9. Multicore Processors and Scaling
Chapter 10 : Other Parallelization Technologies
1. Other Parallelization Technologies
2. GPU-Based Computing
3. Language Extensions
4. Alternative Languages
5. Clustering Technologies
6. Transactional Memory
7. Vectorization
Comments
Post a Comment