Vasily Volkov (UC Berkeley): Unrolling parallel loops

· klm's blog


Original post is here: eklausmeier.goip.de

Loop unrolling is not only good for sequential programming, it has similar dramatic effects in highly parallel codes as well, see Unrolling parallel loops (local copy), also see #pragma unroll in the NVidia CUDA programming guide.

Some bullet points of the presentation:

More resources consumed per thread

Note: each load costs 2 arithmetic instructions

Conclusion:

See Vasily Volkov.

Cédric Augonnet, Samuel Thibault and Raymond Namyst call Vasily Volkov a "CUDA-hero" in How to get portable performance on accelerator-based platforms without the agonizing pain.

In a similar vein Dr. Mark Harris describes the beneficial effect of unrolling in parallel reduction.