... > Yeah, one thing I could do is disable interrupts, measure the cycle > count of doing an individual iteration, do this N times, and take the > minimum value as the time to compare. In the end I'll then have two > numbers to compare, like I do in this patch. In theory the variance on > that should be really tight. N will have to depend on the overall > amount of time I'm taking so as not to shut interrupts off for very > long. Let me experiment with this and see how the results look. > -Evan I doubt you'll need many iterations or a long test. You can do tests in userspace without disabling pre-emption or interrupts - the large/silly values they generate are easily ignored. I suspect you'll get enough info from something like: unsigned long x[2]; volatile unsigned long *p = (void *)((unsigned char *)x + 1); full_cpu_barrier() start = rdtsc(); full_cpu_barrier(); *p; *p; *p; *p; *p; *p; *p; *p; *p; *p; *p; *p; *p; *p; *p; *p; full_cpu_barrier() elapsed = rdtsc() - start; Once the i-cache is loaded it should be pretty constant. For aligned addresses I'd expect each extra '*p' to be one more clock. With hardware support for misaligned transfers at most 2 clocks (test on x86 and it will be 1 clock). The emulated version will be 100s or 1000s. I'm not sure how much of a cpu barrier you need. Definitely needs to wait for all memory accesses and the rdtsc(). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)