> It doesn't happen often, but it is a possibility that the kernel > calibrates the delay wrong because of timing glitches caused by CPU > migration, paging, or other phenomena which are supposed to be > transparent to the kernel (but cause temporal lapse). We're supposed to handle those because they happen on real hardware too with long running SMM handlers. Or at least there was a effort some time ago to do this. If it wasn't enough we'll likely need to fix the code. > In that case, the > kernel may not make enough progress in a spin delay loop to properly > reach the number of microseconds required for N number of timer ticks to > occur. Hmm, mdelay is polling RDTSC and assumes it makes forward progress and waits until the time that was estimated at the original TSC<->PIT calibration passed. While there is a spin loop it is definitely polling a timer that is supposed to tick properly even in virtualization. You're saying that doesn't work on vmware? Does it have trouble with RDTSC? Anyways if polling against TSC doesn't work I suppose we could change it to poll against some other timer. > In theory this can happen on a real machine, as SMM mode could > be active, doing USB device emulation or something that takes a while > during the lpj calibration and throwing the computation off. Yep. > By changing the parameters (N ticks at K Hz in T seconds), it is easy to > create an unstable measurement that can achieve high failure rates, > although in practice the Linux parameters appear to be reasonable enough > that it is not a major problem. Hmm, why exactly? -Andi