Parallelism is often applied in the quest for higher graphics performance. Parallel rasterization architectures divide work across multiple functional units, thus potentially decreasing the locality of texture references. For such architectures to scale well, it is necessary to develop efficient parallel texture caching subsystems.
We quantify the effects of parallel rasterization on texture locality for a number of rasterization architectures, representing both current commercial products and proposed future architectures. A cycle-accurate simulation of the rasterization system demonstrates the parallel speedup obtained by these systems and quantifies inefficiencies due to redundant work, inherent parallel load imbalance, insufficient memory bandwidth, and resource contention. We find that parallel texture caching works well, and is general enough to work with a wide variety of rasterization architectures.
The parallel texture caching study was primarily done by Homan Igehy, Matthew Eldridge, and Pat Hanrahan. Details of the work can be found in the HWWS '99 paper entitled Parallel Texture Caching.