Re: Random 3 to 4.5 second blocks of RT thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 11, 2013 at 10:54 AM, Ralf Müller <ralf@xxxxxxxx> wrote:
> Hi.
>
> I've got a problem with a box at a customer, where about once a day (overall 20 events in 16 days) a realtime thread blocks for 3 to 4.5 seconds. This threads only job is to define a kind of time normal for other subsystems. It basically is a loop that wakes up every milli second, increments a generation counter, does some statistics and goes to sleep for another millisecond. It does this quite well - except for these random multi second sleeps.
>
> My problem: this only happens on a customer machine (actually two different machines at different places, both with very limited access). I tried to reproduce the bug on two test systems here, which are configured as close to the customer system as possible for me (bios settings, kernel configuration, ... about the same load situation) but I was not able to get such a block for more then 3 month (I get jitters of less then 30 micro seconds what is quite ok for the use case).
>
> The thread is configured to run with SCHED_RR priority 99 via pthread_setschedparam(). There are no other user threads in the system with such a high priority. There is no swap file, memory is locked via mlockall. There are two 24/7 disks configured to never got to suspend, running a md-RAID 1. The thread itself does not do anything time consuming, it does not do anything that can block. All these systems run on an old OpenSUSE 11.2 with kernel 2.6.33.2-jen97-rt. A maybe relevant kernel option set, is "processor.max_cstate=0". I know it's not just a time warp on the system, it actually blocks that long because a connected system reports a communication error.
>
> My next try would be to change the board of a customer system by one of the boards of the test systems and hope the error moves to me. As this is a bit expensive and I'm not sure it will really help I would like to ask if someone here can give any hint how to debug such a problem or has any idea what in a system blocks for 3 to 4.5 seconds and how I can avoid such blocks.
>
> I know the problem is related to old software and the description is at least a bit vague. My real hope is that someone says: "yes, exactly the same problem I had the day before and I solved it by setting this or that option". Hope dies last.

resending since gmail decide to mix html into my original message.

Since you mentioned BIOS, I assume that you are running on a x86 platform.
I noted that you didn't mention if SMIs have been ruled out as a cause of the
latency issues. You've probably already ruled this out, but I figured I'd ask
anyway :)

Cheers,

Bruce

>
> Best regards and thanks in advance
> Ralf Müller--
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end"
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux