Hi folks, I was comparing profiles between two machines and realised there was a big discrepancy between them on an unlink workload that was kinda weird. I pulled the string, and realised the problem was cacheline bouncing interfering with cache residency of read-only variables. Hence the first patch. The second patch came about from working out what variable was causing the cacheline bouncing that wasn't showing up in the CPU usage profiles as overhead in the code paths that were contending on it. And for larger machines, converting the atomic variable to a per-cpu counter provides a major performance win. Thoughts, comments, etc all welcome. -Dave.