On 5/1/2015 5:23 PM, Frederic Weisbecker wrote:
On Fri, May 01, 2015 at 03:57:51PM -0400, Chris Metcalf wrote:
For example, booting with only cpu 0 as a housekeeping core (and
therefore all watchdogs 1-35 on my 36-core tilegx are parked), and
immediately doing "echo 0 > /proc/sys/kernel/watchdog", I see
(via SysRq ^O-l) the first parked watchdog, on cpu 1, hung with:
frame 0: 0xfffffff7000f2928 lock_hrtimer_base+0xb8/0xc0
frame 1: 0xfffffff7000f2a28 hrtimer_try_to_cancel+0x40/0x170
frame 2: 0xfffffff7000f2a28 hrtimer_try_to_cancel+0x40/0x170
frame 3: 0xfffffff7000f2b98 hrtimer_cancel+0x40/0x68
frame 4: 0xfffffff70014cce0 watchdog_disable+0x50/0x70
frame 5: 0xfffffff70008c2d0 smpboot_thread_fn+0x350/0x438
frame 6: 0xfffffff700084b28 kthread+0x160/0x178
Have you tried to do that before your patchset?
Yes, it works fine. It requires the presence of the parked threads to trigger the issue.
The config does not have NO_HZ_FULL_ALL or NO_HZ_FULL_SYSIDLE
set, and does have RCU_FAST_NO_HZ and RCU_NOCB_CPU_ALL.
I don't really know how to start debugging this, but I do know that
unparking the threads first avoids the issue :-)
Do you have CONFIG_PROVE_LOCKING=y ?
There seems to be some skew between the community version, which is throwing a
bunch of errors when I enable PROVE_LOCKING, and our internal version where some
things are not yet upstreamed but PROVE_LOCKING works :-)
I'll try to set aside some time to reconcile the two to figure it out.
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html