On Tue, Feb 12, 2013 at 7:53 AM, Ralf Müller <ralf@xxxxxxxx> wrote: > > Am 11.02.2013 um 18:08 schrieb Bruce Ashfield: > >> On Mon, Feb 11, 2013 at 10:54 AM, Ralf Müller <ralf@xxxxxxxx> wrote: >>> >>> I've got a problem with a box at a customer, where about once a day (overall 20 events in 16 days) a realtime thread blocks for 3 to 4.5 seconds. ... It basically is a loop that wakes up every milli second ... It does this quite well - except for these random multi second sleeps. >> >> Since you mentioned BIOS, I assume that you are running on a x86 platform. > > It's x86 - yes. > >> I noted that you didn't mention if SMIs have been ruled out as a cause of the >> latency issues. You've probably already ruled this out, but I figured I'd ask >> anyway :) > > The question is a good one. What I read about SMI when I started this project, said that expected latencies from SMI would be something from some micro- to a maximum of some milliseconds. As my latency constraints are relatively weak - I can perfectly live with 50 microseconds and I would not be happy, but could at least deal with 10 milliseconds every now and then - I did not follow that trail very far. When SMI can really create 3 to 4 seconds blocks I will have to look into that deeper ... anyway - I will have a look at SMI the next days. Thanks a lot for that hint. > I agree that such a long SMI isn't likely, but running the hwlat detector is a fairly simple way to see if time is in fact being stolen from your kernel, so it's something to look into. > BTW: Are there any links to SMI events in a multi second range? What in a system is done within such a long time span? I've never seen it first hand, but I have heard of thermal SMIs that can "borrow" quite a bit of time (depending on what your platform is doing). Like anything, not all BIOS/SMIs are created equal :) Which of course can be more of a problem in your case, since it sounds like you aren't on exactly the same h/w as the problematic system. Anyway, just a thought and something to rule out. If the customer machines can have tracing enabled, you can always set a latency threshold and stop tracing when it is crossed. That should get you insight into what is happening in the long latency case. Cheers, Bruce > > Best regards > Ralf > -- "Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end" -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html