Hi, I am trying to use timerfd feature with RT patch but the thread hangs (seems to busy-wait in the kernel) on a board with dual-core Cortex-A9 ARM processor. Below is a table of the test results: ------------------------------------------------------------------------------ SCHED_FIFO, SCHED_RR | Priority = 1 | Fully Preemptible RT kernel | Works** SCHED_FIFO, SCHED_RR | Priority > 1 | Fully Preemptible RT kernel | Hangs* SCHED_FIFO, SCHED_RR | Any priority | Fully Preemptible RT kernel | Works when the test program is "strace"ed. SCHED_OTHER | | Fully Preemptible RT kernel | Works Any of the 3 policies | Any Priority | Low-latency Desktop kernel | Works ----------------------------------------------------------------------------- Works** : Ran around 50000 iterations and did not see a hang. Hangs* : Thread is busy running inside the kernel and cannot be killed. Most of the times "timerfd_settime" or the "read" that follows hangs. Very rarely, timerfd_create itself hangs. Hangs happen when the thread's CPU affinity is set to either core or affinity is not set at all. I have tried single core kernel also and that locks-up the entire system as well. Tried with and without high-resolution timers and both hang. I have tried slightly older kernels with RT patch and also the latest stable 3.0.14-rt32 and the test program hangs on every kernel. I enabled several debug related options (PROVE_LOCKING, PROVE_RCU, DEBUG_LOCKDEP, RCU_CPU_STALL_VERBOSE, etc) and there is no extra splat except the one-line error "[ 295.924804] INFO: rcu_preempt_state detected stall on CPU 1 (t=1920 jiffies)". Then, I tried "SysReq+t" and attached the output file "OutputOfSysReq_t.txt". Call-stack of the hanging thread: [ 312.152954] testTimerfd R running 0 1359 1343 0x00000000 [ 312.159637] Backtrace: [ 312.162231] [<c04fd1b0>] (__schedule+0x0/0x820) from [<c04fda14>] (preempt_schedule+0x44/0x64) [ 312.171295] [<c04fd9d0>] (preempt_schedule+0x0/0x64) from [<c0500b7c>] (_raw_spin_unlock_irqrestore+0x68/0x78) [ 312.181793] r5:a0000113 r4:c129a728 [ 312.185577] [<c0500b14>] (_raw_spin_unlock_irqrestore+0x0/0x78) from [<c00c9558>] (hrtimer_try_to_cancel+0x54/0x1c0) [ 312.196624] r5:00000000 r4:00000003 [ 312.200408] [<c00c9504>] (hrtimer_try_to_cancel+0x0/0x1c0) from [<c01c6a08>] (sys_timerfd_settime+0x134/0x394) [ 312.210906] r7:00000161 r6:40048000 r5:00000000 r4:00000003 [ 312.216918] [<c01c68d4>] (sys_timerfd_settime+0x0/0x394) from [<c0063800>] (ret_fast_syscall+0x0/0x48) I have also attached the source code of the test "testTimerfd.c" that can be used to reproduce this issue as below: ./testTimerfd -n5 -p2 -t500 -sF -a1 strace -f -tt ./testTimerfd -n5 -p99 -t500 -sF -a1 2>strace.log PS:I tried an x86 system (Nehalem/Arrandale processor) that has the RT kernel 3.0.1-rt11 SMP PREEMPT RT and I see the same behavior mentioned in the table above for ARM. Any help to debug/fix this is highly appreciated. Thanks in advance, Sankara
Attachment:
sysreq_output_and_test.tar.gz
Description: GNU Zip compressed data