On Mon, 29 Aug 2011, Sankara Muthukrishnan wrote: > On panda baord, I ran the v0.74 of cyclictest > (git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git) > to measure the latency (./cyclictest -l4000000000 -m -S -p99 -i70 -h60 -i 70 ??? That's a 70us interval. Pretty damned close to what that CPU can handle. :) > -q -n). These arguments make the test to use TIMER_ABSTIME for > clock_nanosleep and CPU affinity (using sched_setaffinity) for the 2 > threads to be set to each CPU. It is expected to see large latencies > without the RT patch. However, when I ran the tests overnight, I > observed maximum latency of 4294967103 us (weird but it is close to > unsigned int max). So, I instrumented the test to print some > additional information and exit as soon as it finds such a weird > latency. I was also trying to stress ethernet/network/interrupts of > the system with SFTP but I think (not very sure) I could reproduce the > issue without that. clock_nanosleep was called to sleep until > 27608:739311172 (sec:nsec), but after clock_nanosleep returned, > clock_gettime returned the time as 27608:739117429 (sec:nsec) which is > roughly 193 usec earlier than the value passed to clock_nanosleep and > that is the bug. I ran the test with just one thread ( remove "-S" and > add "-t1 -a1 -n" ) and saw the weird latency of 4294967294 usec. That looks like a problem in the clocksource. i.e. time is going backward or having weird momentary jumps. You could verify that by running one ore more tight loops which do clock_gettime(CLOCK_MONOTONIC, &prev); while (1) { clock_gettime(CLOCK_MONOTONIC, &curr); if (curr < prev) /* Use a proper compare function for timespec! */ printf(.....); prev = curr; } If that triggers on RT or on a vanilla kernel, then the problem is definitely somewhere in the timekeeping/clocksource area. Thanks, tglx