2008/5/12 David Chinner <dgc@xxxxxxx>: > On Sun, May 11, 2008 at 09:18:07AM +0530, Kamalesh Babulal wrote: >> Kamalesh Babulal wrote: >> > Adding the cc to kernel-list, Ingo Molnar and Peter Zijlstra >> > >> > Alexander Beregalov wrote: >> >> [ INFO: possible circular locking dependency detected ] >> >> 2.6.26-rc1-00279-g28a4acb #13 >> >> ------------------------------------------------------- >> >> nfsd/3087 is trying to acquire lock: >> >> (iprune_mutex){--..}, at: [<c016f947>] shrink_icache_memory+0x38/0x19b >> >> >> >> but task is already holding lock: >> >> (&(&ip->i_iolock)->mr_lock){----}, at: [<c0210b83>] xfs_ilock+0xa2/0xd6 >> >> >> >> which lock already depends on the new lock. >> >> >> >> >> >> the existing dependency chain (in reverse order) is: >> >> >> >> -> #1 (&(&ip->i_iolock)->mr_lock){----}: >> >> [<c01352e6>] __lock_acquire+0xa0c/0xbc6 >> >> [<c013550a>] lock_acquire+0x6a/0x86 >> >> [<c012c39a>] down_write_nested+0x33/0x6a >> >> [<c0210b5c>] xfs_ilock+0x7b/0xd6 >> >> [<c0210cd5>] xfs_ireclaim+0x1d/0x59 >> >> [<c022edfe>] xfs_finish_reclaim+0x173/0x195 >> >> [<c0230fa3>] xfs_reclaim+0xb3/0x138 >> >> [<c023b4cb>] xfs_fs_clear_inode+0x55/0x8e >> >> [<c016f60b>] clear_inode+0x83/0xd2 >> >> [<c016f88a>] dispose_list+0x3c/0xc1 >> >> [<c016fa82>] shrink_icache_memory+0x173/0x19b >> >> [<c014a68d>] shrink_slab+0xda/0x14e >> >> [<c014a8e5>] try_to_free_pages+0x1e4/0x2a2 >> >> [<c0146997>] __alloc_pages_internal+0x23a/0x39d >> >> [<c0146b11>] __alloc_pages+0xa/0xc >> >> [<c01483b2>] __do_page_cache_readahead+0xaa/0x16a >> >> [<c01484bc>] force_page_cache_readahead+0x4a/0x74 >> >> [<c014c9b0>] sys_madvise+0x308/0x400 >> >> [<c0102b25>] sysenter_past_esp+0x6a/0xb1 >> >> [<ffffffff>] 0xffffffff >> >> >> >> -> #0 (iprune_mutex){--..}: >> >> [<c0135203>] __lock_acquire+0x929/0xbc6 >> >> [<c013550a>] lock_acquire+0x6a/0x86 >> >> [<c0356a6f>] mutex_lock_nested+0xb4/0x226 >> >> [<c016f947>] shrink_icache_memory+0x38/0x19b >> >> [<c014a68d>] shrink_slab+0xda/0x14e >> >> [<c014a8e5>] try_to_free_pages+0x1e4/0x2a2 >> >> [<c0146997>] __alloc_pages_internal+0x23a/0x39d >> >> [<c0146b11>] __alloc_pages+0xa/0xc >> >> [<c01483b2>] __do_page_cache_readahead+0xaa/0x16a >> >> [<c014866c>] ondemand_readahead+0x119/0x127 >> >> [<c01486cc>] page_cache_async_readahead+0x52/0x5d >> >> [<c0178e46>] generic_file_splice_read+0x290/0x4a8 >> >> [<c0239f06>] xfs_splice_read+0x4b/0x78 >> >> [<c0237713>] xfs_file_splice_read+0x24/0x29 >> >> [<c0178182>] do_splice_to+0x45/0x63 >> >> [<c01783f6>] splice_direct_to_actor+0xab/0x150 >> >> [<c01ce8e1>] nfsd_vfs_read+0x1ed/0x2d0 >> >> [<c01ced50>] nfsd_read+0x82/0x99 >> >> [<c01d42bc>] nfsd3_proc_read+0xdf/0x12a >> >> [<c01cb40b>] nfsd_dispatch+0xcf/0x19e >> >> [<c033f484>] svc_process+0x3b3/0x68b >> >> [<c01cb939>] nfsd+0x168/0x26b >> >> [<c0103747>] kernel_thread_helper+0x7/0x10 >> >> [<ffffffff>] 0xffffffff > > Oh, yeah, that. Direct inode reclaim through memory pressure. > > Effectively memory reclaim inverts locking order w.r.t. iprune_mutex > when it recurses into the filesystem. False positive - can never > cause a deadlock on XFS. Can't be solved from the XFS side of things > without effectively turning off lockdep checking for xfs inode > locking. Yes, it is not a deadlock, but machine hangs for few seconds. It still happens about once a day for me. Every kernel report looks similar to the above. I cannot reproduce it quickly, so bisect is not possible. > > The fix is needed to lockdep via iprune_mutex annotations here.... > >> May 9 02:16:46 nomad64 kernel: [42951853.992965] the existing dependency chain (in reverse order) is: >> May 9 02:16:46 nomad64 kernel: [42951853.992967] >> May 9 02:16:46 nomad64 kernel: [42951853.992968] -> #1 (&(&ip->i_iolock)->mr_lock){----}: >> May 9 02:16:46 nomad64 kernel: [42951853.992974] [<ffffffff80261d72>] __lock_acquire+0xf92/0x1080 >> May 9 02:16:46 nomad64 kernel: [42951853.992989] [<ffffffff80261f02>] lock_acquire+0xa2/0xd0 >> May 9 02:16:46 nomad64 kernel: [42951853.993002] [<ffffffff80255556>] down_write_nested+0x46/0x80 >> May 9 02:16:46 nomad64 kernel: [42951853.993018] [<ffffffff80387fb9>] xfs_ilock+0x99/0xa0 >> May 9 02:16:46 nomad64 kernel: [42951853.993034] [<ffffffff803a5117>] xfs_free_eofblocks+0x1c7/0x250 >> May 9 02:16:46 nomad64 kernel: [42951853.993049] [<ffffffff803a8a26>] xfs_release+0x186/0x1d0 >> May 9 02:16:46 nomad64 kernel: [42951853.993062] [<ffffffff803aeeb0>] xfs_file_release+0x10/0x20 >> May 9 02:16:46 nomad64 kernel: [42951853.993076] [<ffffffff802a01cc>] __fput+0xcc/0x1c0 >> May 9 02:16:46 nomad64 kernel: [42951853.993091] [<ffffffff802a05e6>] fput+0x16/0x20 >> May 9 02:16:46 nomad64 kernel: [42951853.993105] [<ffffffff8028865a>] remove_vma+0x4a/0x80 >> May 9 02:16:46 nomad64 kernel: [42951853.993120] [<ffffffff802894e1>] do_munmap+0x281/0x2e0 >> May 9 02:16:46 nomad64 kernel: [42951853.993134] [<ffffffff8028958b>] sys_munmap+0x4b/0x70 >> May 9 02:16:46 nomad64 kernel: [42951853.993148] [<ffffffff8020b62b>] system_call_after_swapgs+0x7b/0x80 >> May 9 02:16:46 nomad64 kernel: [42951853.993161] [<ffffffffffffffff>] 0xffffffffffffffff > > hmmmm. Sounds like: > > fd = open() > addr = mmap(fd) > close(fd) > ..... > munmap(addr); > > But yes, XFS takes locks in ->release which means..... > >> May 9 02:16:46 nomad64 kernel: [42951853.993293] Call Trace: >> May 9 02:16:46 nomad64 kernel: [42951853.993297] [<ffffffff8025f2b3>] print_circular_bug_tail+0x83/0x90 >> May 9 02:16:46 nomad64 kernel: [42951853.993302] [<ffffffff80261b90>] __lock_acquire+0xdb0/0x1080 >> May 9 02:16:46 nomad64 kernel: [42951853.993306] [<ffffffff80222bbd>] ? do_page_fault+0xdd/0x890 >> May 9 02:16:46 nomad64 kernel: [42951853.993310] [<ffffffff80261f02>] lock_acquire+0xa2/0xd0 >> May 9 02:16:46 nomad64 kernel: [42951853.993313] [<ffffffff80222bbd>] ? do_page_fault+0xdd/0x890 >> May 9 02:16:46 nomad64 kernel: [42951853.993317] [<ffffffff806b887b>] down_read+0x3b/0x70 >> May 9 02:16:46 nomad64 kernel: [42951853.993320] [<ffffffff80222bbd>] do_page_fault+0xdd/0x890 >> May 9 02:16:46 nomad64 kernel: [42951853.993324] [<ffffffff806ba5dd>] error_exit+0x0/0xa9 >> May 9 02:16:46 nomad64 kernel: [42951853.993328] [<ffffffff802739b6>] ? file_read_actor+0x46/0x1b0 >> May 9 02:16:46 nomad64 kernel: [42951853.993331] [<ffffffff806ba3d6>] ? _read_unlock_irq+0x36/0x60 >> May 9 02:16:46 nomad64 kernel: [42951853.993335] [<ffffffff80275dbc>] ? generic_file_aio_read+0x2cc/0x5d0 >> May 9 02:16:46 nomad64 kernel: [42951853.993339] [<ffffffff8025ddb9>] ? get_lock_stats+0x19/0x70 >> May 9 02:16:46 nomad64 kernel: [42951853.993343] [<ffffffff803b2769>] ? xfs_read+0x139/0x220 >> May 9 02:16:46 nomad64 kernel: [42951853.993347] [<ffffffff803af06d>] ? xfs_file_aio_read+0x4d/0x60 >> May 9 02:16:46 nomad64 kernel: [42951853.993350] [<ffffffff8029eeb1>] ? do_sync_read+0xf1/0x130 >> May 9 02:16:46 nomad64 kernel: [42951853.993354] [<ffffffff802516e0>] ? autoremove_wake_function+0x0/0x40 >> May 9 02:16:46 nomad64 kernel: [42951853.993358] [<ffffffff8026089a>] ? trace_hardirqs_on+0xda/0x170 >> May 9 02:16:46 nomad64 kernel: [42951853.993361] [<ffffffff80272e45>] ? __rcu_read_unlock+0xb5/0xc0 >> May 9 02:16:46 nomad64 kernel: [42951853.993365] [<ffffffff8026089a>] ? trace_hardirqs_on+0xda/0x170 >> May 9 02:16:46 nomad64 kernel: [42951853.993369] [<ffffffff803c4381>] ? security_file_permission+0x11/0x20 >> May 9 02:16:46 nomad64 kernel: [42951853.993374] [<ffffffff8029f794>] ? vfs_read+0xc4/0x160 >> May 9 02:16:46 nomad64 kernel: [42951853.993377] [<ffffffff8029fc30>] ? sys_read+0x50/0x90 >> May 9 02:16:46 nomad64 kernel: [42951853.993380] [<ffffffff8020b62b>] ? system_call_after_swapgs+0x7b/0x80 > > Oh, joy - a page fault during a read() call triggers lock order > inversions on the mmap->sem. I don't think this can deadlock > (can't be page faulting in a vma that is being torn down), but > it's clear from the last trace that the VM has a mmap->sem > inversion problem with ->release vs ->read and page faults... > > Basically what we are seeing here in both cases is that the VM is > calling inode ->release or ->clear_inode methods with different high > level locks held. If the filesystem has to take the same locks in > these methods as it does in, say, ->read (like XFS does), then we > are guaranteed to get reports like this. AFAICT there's nothing we > can do from the filesystem perspective to prevent false positives like > this from being reported.... > > Cheers, > > Dave. > -- > Dave Chinner > Principal Engineer > SGI Australian Software Group > -- > To unsubscribe from this list: send the line "unsubscribe kernel-testers" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html