Re: Random 3 to 4.5 second blocks of RT thread

Bruce Ashfield <bruce.ashfield@xxxxxxxxx> · Tue, 12 Feb 2013 09:28:43 -0500

On Tue, Feb 12, 2013 at 7:53 AM, Ralf Müller <ralf@xxxxxxxx> wrote:
>
> Am 11.02.2013 um 18:08 schrieb Bruce Ashfield:
>
>> On Mon, Feb 11, 2013 at 10:54 AM, Ralf Müller <ralf@xxxxxxxx> wrote:
>>>
>>> I've got a problem with a box at a customer, where about once a day (overall 20 events in 16 days) a realtime thread blocks for 3 to 4.5 seconds. ... It basically is a loop that wakes up every milli second ... It does this quite well - except for these random multi second sleeps.
>>
>> Since you mentioned BIOS, I assume that you are running on a x86 platform.
>
> It's x86 - yes.
>
>> I noted that you didn't mention if SMIs have been ruled out as a cause of the
>> latency issues. You've probably already ruled this out, but I figured I'd ask
>> anyway :)
>
> The question is a good one. What I read about SMI when I started this project, said that expected latencies from SMI would be something from some micro- to a maximum of some milliseconds. As my latency constraints are relatively weak - I can perfectly live with 50 microseconds and I would not be happy, but could at least deal with 10 milliseconds every now and then - I did not follow that trail very far. When SMI can really create 3 to 4 seconds blocks I will have to look into that deeper ... anyway - I will have a look at SMI the next days. Thanks a lot for that hint.
>

I agree that such a long SMI isn't likely, but running the hwlat
detector is a fairly simple
way to see if time is in fact being stolen from your kernel, so it's
something to look into.

> BTW: Are there any links to SMI events in a multi second range? What in a system is done within such a long time span?

I've never seen it first hand, but I have heard of thermal SMIs that
can "borrow" quite
a bit of time (depending on what your platform is doing). Like
anything, not all BIOS/SMIs
are created equal :) Which of course can be more of a problem in your
case, since it
sounds like you aren't on exactly the same h/w as the problematic system.

Anyway, just a thought and something to rule out.

If the customer machines can have tracing enabled, you can always set
a latency threshold
and stop tracing when it is crossed. That should get you insight into
what is happening in
the long latency case.

Cheers,

Bruce

>
> Best regards
> Ralf
>

--
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end"
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html