Hello,
Thanks for the tip for testing.
I guess I should open a new thread because what follows is more about
the result that the testing procedure.
Quick recap on my original test, I have a kernel module timer (clock
monotonic, absolute) flipping a bit with some outb(val, 0x3f8 + COM_MCR)
I ran 'cyclictest' in parallel with all the load (make -jN) with a local
kernel tree and another on nfs, both give similar results: cyclictest is
spot on, my timer does occasional excursion.
So I looked at cyclictest and thought let's do it the same way. now I
have now another cdev module giving user land access to flip COM0 with
some IOCTL... and to my surprise: that does perform well.
I'm a new comer to these matters but I find it counter-intuitive my RT
tasks (set priority 99) "works better" than my kernel timer. I'm looking
at understanding this better. Is it just expected? some params I can set
to harden things in my kernel timer? any pointers to understand this
would be great.
Regards,
Matthieu
On 05/16/12 08:55, Clark Williams wrote:
On Tue, 15 May 2012 21:55:37 -0400
Steven Rostedt<rostedt@xxxxxxxxxxx> wrote:
On Tue, 2012-05-15 at 16:08 -0700, Matthieu Bec wrote:
Hello all,
I was wondering what people used to check RT_PREEMPT behavior under
load/stress?
There is a test suite that Red Hat uses called rt-eval (I believe).
Clark can give you more info on that.
It's called rteval and I have a git tree here:
git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rteval.git
It's basically some python scripting to do much of what Steven describes
below. When it starts up it kicks off a kernel make with 2* the number
of available processors (make -j<n*2>) and runs hackbench, both in
loop. Then it kicks off cyclictest to measure the system latency under
load.
I usually run it like this:
$ sudo rteval --duration=12h
At the end it summarizes the results of the run.
I'm trying to test the accuracy of my timers and have a test where I
setup a kernel module with an hr-timer flipping RTS bit on serial COM0
periodically, which I can look on an oscilloscope. the scope triggers on
rising edge, I call jitter what shows on the falling side:
under no specific load I get ~ 10 us (worst case waiting a long time)
My initial idea for stressing the system was to compile a kernel, make
-j 8 (#cores) that I thought would exercise CPU and IO if anything. As
it happens, it's "mostly good" but I do get occasional (but repeatable)
wild excursions (>100us)
The tests I do is the following:
I run "cyclictest -n -p 80 -t -i 250" then in another window I run a
kernel compile using distcc (to stress the network as well) with make
-j40, it basically does:
while :; make clean; make -j40; done
Then I also run hackbench (written by Rusty Russell), with:
while :; hackbench 50 ; done
I run the above on a single machine, while on another machine I run
ktest against the -rt kernel to test different configs (with and without
PREEMPT_RT enabled and such). I do this for both i386 and x86_64.
Looking around, I found a tool called 'stress' -
http://weather.ou.edu/~apw/projects/stress/
Under these new conditions, the system behaves really well again ~20 us
stable all the way.
So both tests give different result, I'm not sure which to trust.
I was thinking maybe there is some weird interaction with the kernel and
building the kernel that make the 'bad' test invalid?
I have RT_PREEMPT 3.0.18-rt34 SMP x86_64
Now, I run the above stress tests that I mentioned for several hours
before I release a stable kernel. I run this on a 2.6GHz xeon core2, and
I may hit at most 70us latency with cyclictest. That's a high, it
usually stays below 50us. We consider>100us on this type of hardware a
bug which needs to be fixed.
-- Steve
--
Matthieu Bec GMTO Corp.
cell: +1 626 354 9367 P.O. Box 90933
phone: +1 626 204 0527 Pasadena, CA 91109-0933
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html