The first and simplest case of coalescing can be achieved by any CUDA-enabled device: the k-th thread accesses the k-th word in a segment. Note that not all threads need to participate. (See Figure 1. Note that this figure assumes a device of compute capability 1.x, but that the figure would be much the same except twice as wide for devices of compute capability 2.x.)
This access pattern results in a single 64-byte transaction, indicated by the red rectangle. Note that even though one word is not requested, all data in the segment are fetched. If accesses by threads were permuted within this segment, still one 64-byte transaction would be performed by a device with compute capability 1.2 or higher, but 16 serialized transactions would be performed by a device with compute capability1.1 or lower.