timerfd functions hang on both x86 and ARM with RT patch and RT scheduling policies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am trying to use timerfd feature with RT patch but the thread hangs
(seems to busy-wait in the kernel) on a board with dual-core Cortex-A9
ARM processor. Below is a table of the test results:

------------------------------------------------------------------------------
SCHED_FIFO, SCHED_RR   |  Priority = 1 | Fully Preemptible RT kernel | Works**
SCHED_FIFO, SCHED_RR   |  Priority > 1 | Fully Preemptible RT kernel | Hangs*
SCHED_FIFO, SCHED_RR   |  Any priority | Fully Preemptible RT kernel |
Works when the test program is "strace"ed.
SCHED_OTHER            |               | Fully Preemptible RT kernel | Works
Any of the 3 policies  |  Any Priority | Low-latency Desktop kernel  | Works
-----------------------------------------------------------------------------
Works** : Ran around 50000 iterations and did not see a hang.
Hangs* : Thread is busy running inside the kernel and cannot be
killed. Most of the times "timerfd_settime" or the "read" that follows
hangs. Very rarely, timerfd_create itself hangs. Hangs happen when the
thread's CPU affinity is set to either core or affinity is not set at
all. I have tried single core kernel also and that locks-up the entire
system as well. Tried with and without high-resolution timers and both
hang.

I have tried slightly older kernels with RT patch and also the latest
stable 3.0.14-rt32 and the test program hangs on every kernel. I
enabled several debug related options (PROVE_LOCKING, PROVE_RCU,
DEBUG_LOCKDEP, RCU_CPU_STALL_VERBOSE, etc) and there is no extra splat
except the one-line error "[  295.924804] INFO: rcu_preempt_state
detected stall on CPU 1 (t=1920 jiffies)". Then, I tried "SysReq+t"
and attached the output file "OutputOfSysReq_t.txt". Call-stack of the
hanging thread:

[  312.152954] testTimerfd     R running      0  1359   1343 0x00000000
[  312.159637] Backtrace:
[  312.162231] [<c04fd1b0>] (__schedule+0x0/0x820) from [<c04fda14>]
(preempt_schedule+0x44/0x64)
[  312.171295] [<c04fd9d0>] (preempt_schedule+0x0/0x64) from
[<c0500b7c>] (_raw_spin_unlock_irqrestore+0x68/0x78)
[  312.181793]  r5:a0000113 r4:c129a728
[  312.185577] [<c0500b14>] (_raw_spin_unlock_irqrestore+0x0/0x78)
from [<c00c9558>] (hrtimer_try_to_cancel+0x54/0x1c0)
[  312.196624]  r5:00000000 r4:00000003
[  312.200408] [<c00c9504>] (hrtimer_try_to_cancel+0x0/0x1c0) from
[<c01c6a08>] (sys_timerfd_settime+0x134/0x394)
[  312.210906]  r7:00000161 r6:40048000 r5:00000000 r4:00000003
[  312.216918] [<c01c68d4>] (sys_timerfd_settime+0x0/0x394) from
[<c0063800>] (ret_fast_syscall+0x0/0x48)

I have also attached the source code of the test "testTimerfd.c" that
can be used to reproduce this issue as below:

./testTimerfd -n5 -p2 -t500 -sF -a1
 strace -f -tt ./testTimerfd -n5 -p99 -t500 -sF -a1 2>strace.log

PS:I tried an x86 system (Nehalem/Arrandale processor) that has the RT
kernel  3.0.1-rt11 SMP PREEMPT RT and I see the same behavior
mentioned in the table above for ARM.

Any help to debug/fix this is highly appreciated.

Thanks in advance,
Sankara

Attachment: sysreq_output_and_test.tar.gz
Description: GNU Zip compressed data


[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux