hrtimer: interrupt took 6742 ns, then RT throttling and hung machine for nearly 2 seconds

Stanislav Meduna <stano@xxxxxxxxxx> · Mon, 15 Apr 2013 13:02:42 +0200

Hi,

Apr 15 10:14:57 lnx kernel: [56281.700293] hrtimer: interrupt took 6742 ns
Apr 15 10:14:57 lnx kernel: [330740.000129] [sched_delayed] sched: RT
throttling activated

>From our application logs the machine was basically hung for something
between 1.71 - 1.73 seconds, then resumed normal operation. A 5ms
timerfd_create timer returned 341 expirations, the 340 missed
exactly correspond to the 1.7 seconds.

None of the /sys/kernel/debug/tracing/latency_hist/* reports anything
unusual, they are all in the tens of microsecond range, only wakeup
shared prio is at 957 us between two same prio application threads,
which is expected.

It is not very probable that the reason for the throttling is our
application. We have own monitoring of the runaway tasks and this
did not kick in. Besides, the coincidence with the hrtimer
message looks very suspicious.

The kernel is 3.4.25-rt37 with full preempt on a 1 GHz Celeron M
industrial PC, ICH4 (ata_piix) used for ATA, Intel 82801DB PRO/100 VE
(e100) for ethernet.

Unfortunately it is not easily reproducible - it happens once
per several days and there is no obvious trigger.

Any hints?

Thanks
-- 
                                             Stano
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html