On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: > On Mon, 2014-04-28 at 10:18 -0400, Steven Rostedt wrote: > > On Mon, 28 Apr 2014 11:09:46 +0200 > > Mike Galbraith <umgwanakikbuti@xxxxxxxxx> wrote: > > > > > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch > > > > > > bug: migrate_disable() after blocking is too late. > > > > > > @@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a > > > /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */ > > > if (atomic_add_unless(atomic, -1, 1)) > > > return 0; > > > - migrate_disable(); > > > rt_spin_lock(lock); > > > - if (atomic_dec_and_test(atomic)) > > > + if (atomic_dec_and_test(atomic)){ > > > + migrate_disable(); > > > > Makes sense, as the CPU can go offline right after the lock is grabbed > > and before the migrate_disable() is called. > > > > Seems that migrate_disable() must be called before taking the lock as > > it is done in every other location. > > And for tasklist_lock, seems you also MUST do that prior to trylock as > well, else you'll run afoul of the hotplug beast. This lockdep gripe is from the deadlocked crashdump with only the clearly busted bits patched up. [ 193.033224] ====================================================== [ 193.033225] [ INFO: possible circular locking dependency detected ] [ 193.033227] 3.12.18-rt25 #19 Not tainted [ 193.033227] ------------------------------------------------------- [ 193.033228] boot.kdump/5422 is trying to acquire lock: [ 193.033237] (&hp->lock){+.+...}, at: [<ffffffff81044974>] pin_current_cpu+0x84/0x1d0 [ 193.033238] but task is already holding lock: [ 193.033241] (tasklist_lock){+.+...}, at: [<ffffffff81046a5b>] do_wait+0xbb/0x2a0 [ 193.033242] which lock already depends on the new lock. [ 193.033242] the existing dependency chain (in reverse order) is: [ 193.033244] -> #1 (tasklist_lock){+.+...}: [ 193.033248] [<ffffffff810ae4a8>] check_prevs_add+0xf8/0x180 [ 193.033250] [<ffffffff810aeada>] validate_chain.isra.45+0x5aa/0x750 [ 193.033252] [<ffffffff810af4f6>] __lock_acquire+0x3f6/0x9f0 [ 193.033253] [<ffffffff810b01bc>] lock_acquire+0x8c/0x160 [ 193.033257] [<ffffffff8155e99c>] rt_write_lock+0x2c/0x40 [ 193.033260] [<ffffffff81548169>] _cpu_down+0x219/0x440 [ 193.033261] [<ffffffff815483c0>] cpu_down+0x30/0x50 [ 193.033264] [<ffffffff813711dc>] cpu_subsys_offline+0x1c/0x30 [ 193.033267] [<ffffffff8136c2d5>] device_offline+0x95/0xc0 [ 193.033269] [<ffffffff8136c3e0>] online_store+0x40/0x80 [ 193.033271] [<ffffffff81369813>] dev_attr_store+0x13/0x30 [ 193.033274] [<ffffffff811c8820>] sysfs_write_file+0xf0/0x170 [ 193.033277] [<ffffffff8115a068>] vfs_write+0xc8/0x1d0 [ 193.033279] [<ffffffff8115a500>] SyS_write+0x50/0xa0 [ 193.033282] [<ffffffff81566ca2>] system_call_fastpath+0x16/0x1b [ 193.033284] -> #0 (&hp->lock){+.+...}: [ 193.033286] [<ffffffff810ae39d>] check_prev_add+0x7bd/0x7d0 [ 193.033287] [<ffffffff810ae4a8>] check_prevs_add+0xf8/0x180 [ 193.033289] [<ffffffff810aeada>] validate_chain.isra.45+0x5aa/0x750 [ 193.033291] [<ffffffff810af4f6>] __lock_acquire+0x3f6/0x9f0 [ 193.033293] [<ffffffff810b01bc>] lock_acquire+0x8c/0x160 [ 193.033295] [<ffffffff8155e6a5>] rt_spin_lock+0x55/0x70 [ 193.033296] [<ffffffff81044974>] pin_current_cpu+0x84/0x1d0 [ 193.033299] [<ffffffff81079ef1>] migrate_disable+0x81/0x100 [ 193.033301] [<ffffffff8155e947>] rt_read_lock+0x47/0x60 [ 193.033303] [<ffffffff81046a5b>] do_wait+0xbb/0x2a0 [ 193.033305] [<ffffffff8104777e>] SyS_wait4+0x9e/0x100 [ 193.033307] [<ffffffff81566ca2>] system_call_fastpath+0x16/0x1b [ 193.033307] other info that might help us debug this: [ 193.033308] Possible unsafe locking scenario: [ 193.033309] CPU0 CPU1 [ 193.033309] ---- ---- [ 193.033310] lock(tasklist_lock); [ 193.033312] lock(&hp->lock); [ 193.033313] lock(tasklist_lock); [ 193.033314] lock(&hp->lock); [ 193.033315] *** DEADLOCK *** [ 193.033316] 1 lock held by boot.kdump/5422: [ 193.033319] #0: (tasklist_lock){+.+...}, at: [<ffffffff81046a5b>] do_wait+0xbb/0x2a0 [ 193.033320] stack backtrace: [ 193.033322] CPU: 0 PID: 5422 Comm: boot.kdump Not tainted 3.12.18-rt25 #19 [ 193.033323] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007 [ 193.033326] ffff880200550818 ffff8802004e5ad8 ffffffff8155538c 0000000000000000 [ 193.033328] 0000000000000000 ffff8802004e5b28 ffffffff8154d0df ffff8802004e5b18 [ 193.033330] ffff8802004e5b50 ffff880200550818 ffff8802005507e0 ffff880200550818 [ 193.033331] Call Trace: [ 193.033335] [<ffffffff8155538c>] dump_stack+0x4f/0x91 [ 193.033337] [<ffffffff8154d0df>] print_circular_bug+0xd3/0xe4 [ 193.033339] [<ffffffff810ae39d>] check_prev_add+0x7bd/0x7d0 [ 193.033342] [<ffffffff8107e1f5>] ? sched_clock_local+0x25/0x90 [ 193.033344] [<ffffffff8107e388>] ? sched_clock_cpu+0xa8/0x120 [ 193.033346] [<ffffffff810ae4a8>] check_prevs_add+0xf8/0x180 [ 193.033348] [<ffffffff810aeada>] validate_chain.isra.45+0x5aa/0x750 [ 193.033350] [<ffffffff810af4f6>] __lock_acquire+0x3f6/0x9f0 [ 193.033352] [<ffffffff8155da11>] ? rt_spin_lock_slowlock+0x231/0x280 [ 193.033354] [<ffffffff8155d911>] ? rt_spin_lock_slowlock+0x131/0x280 [ 193.033356] [<ffffffff81044974>] ? pin_current_cpu+0x84/0x1d0 [ 193.033358] [<ffffffff810b01bc>] lock_acquire+0x8c/0x160 [ 193.033360] [<ffffffff81044974>] ? pin_current_cpu+0x84/0x1d0 [ 193.033362] [<ffffffff8155e6a5>] rt_spin_lock+0x55/0x70 [ 193.033363] [<ffffffff81044974>] ? pin_current_cpu+0x84/0x1d0 [ 193.033365] [<ffffffff81044974>] pin_current_cpu+0x84/0x1d0 [ 193.033367] [<ffffffff81079ef1>] migrate_disable+0x81/0x100 [ 193.033369] [<ffffffff8155e947>] rt_read_lock+0x47/0x60 [ 193.033371] [<ffffffff81046a5b>] ? do_wait+0xbb/0x2a0 [ 193.033373] [<ffffffff8155cd39>] ? schedule+0x29/0x90 [ 193.033374] [<ffffffff81046a5b>] do_wait+0xbb/0x2a0 [ 193.033378] [<ffffffff8112ded6>] ? might_fault+0x56/0xb0 [ 193.033380] [<ffffffff8104777e>] SyS_wait4+0x9e/0x100 [ 193.033382] [<ffffffff81566cc7>] ? sysret_check+0x1b/0x56 [ 193.033384] [<ffffffff81045d50>] ? task_stopped_code+0xa0/0xa0 [ 193.033386] [<ffffffff81566ca2>] system_call_fastpath+0x16/0x1b [ 193.033845] SMP alternatives: lockdep: fixing up alternatives -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html