This is a note to let you know that I've just added the patch titled timekeeping: Avoid possible deadlock from clock_was_set_delayed to the 3.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: timekeeping-avoid-possible-deadlock-from-clock_was_set_delayed.patch and it can be found in the queue-3.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 6fdda9a9c5db367130cf32df5d6618d08b89f46a Mon Sep 17 00:00:00 2001 From: John Stultz <john.stultz@xxxxxxxxxx> Date: Tue, 10 Dec 2013 17:18:18 -0800 Subject: timekeeping: Avoid possible deadlock from clock_was_set_delayed From: John Stultz <john.stultz@xxxxxxxxxx> commit 6fdda9a9c5db367130cf32df5d6618d08b89f46a upstream. As part of normal operaions, the hrtimer subsystem frequently calls into the timekeeping code, creating a locking order of hrtimer locks -> timekeeping locks clock_was_set_delayed() was suppoed to allow us to avoid deadlocks between the timekeeping the hrtimer subsystem, so that we could notify the hrtimer subsytem the time had changed while holding the timekeeping locks. This was done by scheduling delayed work that would run later once we were out of the timekeeing code. But unfortunately the lock chains are complex enoguh that in scheduling delayed work, we end up eventually trying to grab an hrtimer lock. Sasha Levin noticed this in testing when the new seqlock lockdep enablement triggered the following (somewhat abrieviated) message: [ 251.100221] ====================================================== [ 251.100221] [ INFO: possible circular locking dependency detected ] [ 251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted [ 251.101967] ------------------------------------------------------- [ 251.101967] kworker/10:1/4506 is trying to acquire lock: [ 251.101967] (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70 [ 251.101967] [ 251.101967] but task is already holding lock: [ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] which lock already depends on the new lock. [ 251.101967] [ 251.101967] [ 251.101967] the existing dependency chain (in reverse order) is: [ 251.101967] -> #5 (hrtimer_bases.lock#11){-.-...}: [snipped] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [snipped] -> #3 (&rq->lock){-.-.-.}: [snipped] -> #2 (&p->pi_lock){-.-.-.}: [snipped] -> #1 (&(&pool->lock)->rlock){-.-...}: [ 251.101967] [<ffffffff81194803>] validate_chain+0x6c3/0x7b0 [ 251.101967] [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580 [ 251.101967] [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0 [ 251.101967] [<ffffffff84398500>] _raw_spin_lock+0x40/0x80 [ 251.101967] [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0 [ 251.101967] [<ffffffff81154168>] queue_work_on+0x98/0x120 [ 251.101967] [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30 [ 251.101967] [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160 [ 251.101967] [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70 [ 251.101967] [<ffffffff843a4b49>] ia32_sysret+0x0/0x5 [ 251.101967] -> #0 (timekeeper_seq){----..}: [snipped] [ 251.101967] other info that might help us debug this: [ 251.101967] [ 251.101967] Chain exists of: timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11 [ 251.101967] Possible unsafe locking scenario: [ 251.101967] [ 251.101967] CPU0 CPU1 [ 251.101967] ---- ---- [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(&rt_b->rt_runtime_lock); [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(timekeeper_seq); [ 251.101967] [ 251.101967] *** DEADLOCK *** [ 251.101967] [ 251.101967] 3 locks held by kworker/10:1/4506: [ 251.101967] #0: (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #1: (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #2: (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] stack backtrace: [ 251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 [ 251.101967] Workqueue: events clock_was_set_work So the best solution is to avoid calling clock_was_set_delayed() while holding the timekeeping lock, and instead using a flag variable to decide if we should call clock_was_set() once we've released the locks. This works for the case here, where the do_adjtimex() was the deadlock trigger point. Unfortuantely, in update_wall_time() we still hold the jiffies lock, which would deadlock with the ipi triggered by clock_was_set(), preventing us from calling it even after we drop the timekeeping lock. So instead call clock_was_set_delayed() at that point. Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Prarit Bhargava <prarit@xxxxxxxxxx> Cc: Richard Cochran <richardcochran@xxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Sasha Levin <sasha.levin@xxxxxxxxxx> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx> Tested-by: Sasha Levin <sasha.levin@xxxxxxxxxx> Signed-off-by: John Stultz <john.stultz@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- kernel/time/timekeeping.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -988,7 +988,8 @@ static void timekeeping_adjust(s64 offse * * Returns the unconsumed cycles. */ -static cycle_t logarithmic_accumulation(cycle_t offset, int shift) +static cycle_t logarithmic_accumulation(cycle_t offset, int shift, + unsigned int *clock_set) { u64 nsecps = (u64)NSEC_PER_SEC << timekeeper.shift; u64 raw_nsecs; @@ -1010,7 +1011,7 @@ static cycle_t logarithmic_accumulation( timekeeper.xtime.tv_sec += leap; timekeeper.wall_to_monotonic.tv_sec -= leap; if (leap) - clock_was_set_delayed(); + *clock_set = 1; } /* Accumulate raw time */ @@ -1042,6 +1043,7 @@ static void update_wall_time(void) struct clocksource *clock; cycle_t offset; int shift = 0, maxshift; + unsigned int clock_set = 0; unsigned long flags; write_seqlock_irqsave(&timekeeper.lock, flags); @@ -1077,7 +1079,7 @@ static void update_wall_time(void) maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1; shift = min(shift, maxshift); while (offset >= timekeeper.cycle_interval) { - offset = logarithmic_accumulation(offset, shift); + offset = logarithmic_accumulation(offset, shift, &clock_set); if(offset < timekeeper.cycle_interval<<shift) shift--; } @@ -1131,7 +1133,7 @@ static void update_wall_time(void) timekeeper.xtime.tv_sec += leap; timekeeper.wall_to_monotonic.tv_sec -= leap; if (leap) - clock_was_set_delayed(); + clock_set = 1; } timekeeping_update(false); @@ -1139,6 +1141,8 @@ static void update_wall_time(void) out: write_sequnlock_irqrestore(&timekeeper.lock, flags); + if (clock_set) + clock_was_set_delayed(); } /** Patches currently in stable-queue which might be from john.stultz@xxxxxxxxxx are queue-3.4/rtc-cmos-add-an-alarm-disable-quirk.patch queue-3.4/timekeeping-avoid-possible-deadlock-from-clock_was_set_delayed.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html