The primary differences occur in threading and memory
access:
- Threading resources. Execution pipelines on host
systems can support a limited number of concurrent threads. Servers
that have four quad-core processors today can run only 16 threads
concurrently (32 if the CPUs support HyperThreading.) By comparison,
the smallest executable unit of parallelism on a CUDA device
comprises 32 threads (a warp). All NVIDIA GPUs can
support at least 768 concurrently active threads per multiprocessor,
and some GPUs support 1,024 or more active threads per multiprocessor
(see Section F.1 of the CUDA C Programming Guide). On devices that
have 30 multiprocessors (such as the NVIDIA® GeForce® GTX 280),
this leads to more than 30,000 active threads.
- Threads. Threads on a CPU are generally heavyweight entities.
The operating system must swap threads on and off of CPU execution
channels to provide multithreading capability. Context switches
(when two threads are swapped) are therefore slow and expensive.
By comparison, threads on GPUs are extremely lightweight. In a typical
system, thousands of threads are queued up for work (in warps of
32 threads each). If the GPU must wait on one warp of threads, it
simply begins executing work on another. Because separate registers
are allocated to all active threads, no swapping of registers or
state need occur between GPU threads. Resources stay allocated to
each thread until it completes its execution.
- RAM. Both the host system and the device have RAM. On
the host system, RAM is generally equally accessible to all code
(within the limitations enforced by the operating system). On the
device, RAM is divided virtually and physically into different types,
each of which has a special purpose and fulfills different needs.
The types of device RAM are explained in the CUDA C Programming
Guide and in of this document.
These are the primary hardware differences between CPU hosts
and GPU devices with respect to parallel programming. Other differences
are discussed as they arise elsewhere in this document.