On Mon, Aug 25, 2008 at 8:44 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote: > On Mon, Aug 25, 2008 at 3:03 PM, Daniel J Blueman > <daniel.blueman@xxxxxxxxx> wrote: >> Hi Linus, Vegard, >> >> On Sun, Aug 24, 2008 at 7:58 PM, Linus Torvalds >> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >>> On Sun, 24 Aug 2008, Vegard Nossum wrote: >> [snip] >>> Anyway, I think your patch is likely fine, I just thought it looked a bit >>> odd to have a loop to move a list from one head pointer to another. >>> >>> But regardless, it would need some testing. Daniel? >> >> This opens another lockdep report at boot-time [1] - promoting >> pool_lock may not be the best fix? >> >> We then see a new deadlock condition (on the pool_lock spinlock) [2], >> which seemingly was avoided by taking the debug-bucket lock first. >> >> We reproduce this by booting with debug_objects=1 and causing a lot of activity. > > Thanks. I get the same thing here as well. > > I tried your suggestion of promoting the lock to irq-safe, and it > fixed the warning for me (didn't get or look for deadlocks yet, but it > seems likely that it is caused by the same thing?), the patch is > attached for reference. > > I also don't know if this is the best fix, but I also don't have any > other (better) suggestions. > > Others are welcome to pick it up from here... The solution looks like is needs to get the lock ordering correct w.r.t. SLUB, as we get this, alas: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.27-rc4-229c-debug #1 ------------------------------------------------------- make/9475 is trying to acquire lock: (&n->list_lock){++..}, at: [<ffffffff802dfb23>] __slab_free+0x3a3/0x3f0 but task is already holding lock: (&obj_hash[i].lock){++..}, at: [<ffffffff8046965e>] __debug_check_no_obj_freed+0x6e/0x170 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&obj_hash[i].lock){++..}: [<ffffffff802703b1>] __lock_acquire+0xdc1/0x1160 [<ffffffff802707e1>] lock_acquire+0x91/0xc0 [<ffffffff806a55d3>] _spin_lock_irqsave+0x53/0x90 [<ffffffff8046965e>] __debug_check_no_obj_freed+0x6e/0x170 [<ffffffff80469775>] debug_check_no_obj_freed+0x15/0x20 [<ffffffff802b4a30>] free_hot_cold_page+0x130/0x290 [<ffffffff802b4c20>] free_hot_page+0x10/0x20 [<ffffffff802b5695>] __free_pages+0x45/0x50 [<ffffffff802df301>] __free_slab+0xa1/0x150 [<ffffffff802df3e8>] discard_slab+0x38/0x60 [<ffffffff802e235f>] kmem_cache_shrink+0x18f/0x2c0 [<ffffffff8048f421>] acpi_os_purge_cache+0xe/0x12 [<ffffffff804a6cbd>] acpi_purge_cached_objects+0x15/0x3d [<ffffffff804a6d4d>] acpi_initialize_objects+0x4e/0x59 [<ffffffff8130559b>] acpi_init+0x91/0x226 [<ffffffff80209265>] do_one_initcall+0x45/0x190 [<ffffffff812e5c9b>] kernel_init+0x14d/0x1b2 [<ffffffff8020d919>] child_rip+0xa/0x11 [<ffffffffffffffff>] 0xffffffffffffffff -> #0 (&n->list_lock){++..}: [<ffffffff80270495>] __lock_acquire+0xea5/0x1160 [<ffffffff802707e1>] lock_acquire+0x91/0xc0 [<ffffffff806a5441>] _spin_lock+0x41/0x80 [<ffffffff802dfb23>] __slab_free+0x3a3/0x3f0 [<ffffffff802dfec4>] kmem_cache_free+0xa4/0x110 [<ffffffff8046958b>] free_object+0x6b/0xd0 -> (&db->lock) taken [<ffffffff804696aa>] __debug_check_no_obj_freed+0xba/0x170 [<ffffffff80469775>] debug_check_no_obj_freed+0x15/0x20 [<ffffffff802dff0c>] kmem_cache_free+0xec/0x110 [<ffffffff8024264b>] __cleanup_signal+0x1b/0x20 [<ffffffff80248273>] release_task+0x233/0x3d0 [<ffffffff80248960>] wait_consider_task+0x550/0x8b0 [<ffffffff80248e16>] do_wait+0x156/0x350 [<ffffffff802490a6>] sys_wait4+0x96/0xf0 [<ffffffff8020c86b>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff other info that might help us debug this: 2 locks held by make/9475: #0: (tasklist_lock){..??}, at: [<ffffffff80248084>] release_task+0x44/0x3d0 #1: (&obj_hash[i].lock){++..}, at: [<ffffffff8046965e>] __debug_check_no_obj_freed+0x6e/0x170 stack backtrace: Pid: 9475, comm: make Not tainted 2.6.27-rc4-229c-debug #1 Call Trace: [<ffffffff8026e2b7>] print_circular_bug_tail+0xa7/0xf0 [<ffffffff80270495>] __lock_acquire+0xea5/0x1160 [<ffffffff802707e1>] lock_acquire+0x91/0xc0 [<ffffffff802dfb23>] ? __slab_free+0x3a3/0x3f0 [<ffffffff806a5441>] _spin_lock+0x41/0x80 [<ffffffff802dfb23>] ? __slab_free+0x3a3/0x3f0 [<ffffffff802dfb23>] __slab_free+0x3a3/0x3f0 [<ffffffff8046958b>] ? free_object+0x6b/0xd0 [<ffffffff802dfec4>] kmem_cache_free+0xa4/0x110 [<ffffffff8046958b>] ? free_object+0x6b/0xd0 [<ffffffff8046958b>] free_object+0x6b/0xd0 -> &db->lock taken [<ffffffff804696aa>] __debug_check_no_obj_freed+0xba/0x170 [<ffffffff80469775>] debug_check_no_obj_freed+0x15/0x20 [<ffffffff802dff0c>] kmem_cache_free+0xec/0x110 [<ffffffff8024264b>] ? __cleanup_signal+0x1b/0x20 [<ffffffff8024264b>] __cleanup_signal+0x1b/0x20 [<ffffffff80248273>] release_task+0x233/0x3d0 [<ffffffff80248960>] wait_consider_task+0x550/0x8b0 [<ffffffff80248e16>] do_wait+0x156/0x350 [<ffffffff8023b6f0>] ? default_wake_function+0x0/0x10 [<ffffffff802490a6>] sys_wait4+0x96/0xf0 [<ffffffff8020c86b>] system_call_fastpath+0x16/0x1b -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html