On Fri, Aug 14, 2009 at 03:58:47PM -0700, Andrew Morton wrote: > On Fri, 14 Aug 2009 17:25:05 +0200 > Nick Piggin <npiggin@xxxxxxx> wrote: > > > > > We have had a report of memory allocation hangs during DVD-RAM (UDF) writing. > > > > Jan tracked the cause of this down to UDF inode reclaim blocking: > > > > gnome-screens D ffff810006d1d598 0 20686 1 > > ffff810006d1d508 0000000000000082 ffff810037db6718 0000000000000800 > > ffff810006d1d488 ffffffff807e4280 ffffffff807e4280 ffff810006d1a580 > > ffff8100bccbc140 ffff810006d1a8c0 0000000006d1d4e8 ffff810006d1a8c0 > > Call Trace: > > [<ffffffff804477f3>] io_schedule+0x63/0xa5 > > [<ffffffff802c2587>] sync_buffer+0x3b/0x3f > > [<ffffffff80447d2a>] __wait_on_bit+0x47/0x79 > > [<ffffffff80447dc6>] out_of_line_wait_on_bit+0x6a/0x77 > > [<ffffffff802c24f6>] __wait_on_buffer+0x1f/0x21 > > [<ffffffff802c442a>] __bread+0x70/0x86 > > [<ffffffff88de9ec7>] :udf:udf_tread+0x38/0x3a > > [<ffffffff88de0fcf>] :udf:udf_update_inode+0x4d/0x68c > > [<ffffffff88de26e1>] :udf:udf_write_inode+0x1d/0x2b > > [<ffffffff802bcf85>] __writeback_single_inode+0x1c0/0x394 > > [<ffffffff802bd205>] write_inode_now+0x7d/0xc4 > > [<ffffffff88de2e76>] :udf:udf_clear_inode+0x3d/0x53 > > [<ffffffff802b39ae>] clear_inode+0xc2/0x11b > > [<ffffffff802b3ab1>] dispose_list+0x5b/0x102 > > [<ffffffff802b3d35>] shrink_icache_memory+0x1dd/0x213 > > [<ffffffff8027ede3>] shrink_slab+0xe3/0x158 > > [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232 > > [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392 > > [<ffffffff802951fa>] alloc_page_vma+0x176/0x189 > > [<ffffffff802822d8>] __do_fault+0x10c/0x417 > > [<ffffffff80284232>] handle_mm_fault+0x466/0x940 > > [<ffffffff8044b922>] do_page_fault+0x676/0xabf > > > > Which blocks with the inode lock held, which then blocks other > > reclaimers: > > > > X D ffff81009d47c400 0 17285 14831 > > ffff8100844f3728 0000000000000086 0000000000000000 ffff81000000e288 > > ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff81009d47c400 > > ffffffff805ff890 ffff81009d47c740 00000000844f3808 ffff81009d47c740 > > Call Trace: > > [<ffffffff80447f8c>] __mutex_lock_slowpath+0x72/0xa9 > > [<ffffffff80447e1a>] mutex_lock+0x1e/0x22 > > [<ffffffff802b3ba1>] shrink_icache_memory+0x49/0x213 > > [<ffffffff8027ede3>] shrink_slab+0xe3/0x158 > > [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232 > > [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392 > > [<ffffffff8029507f>] alloc_pages_current+0xd1/0xd6 > > [<ffffffff80279ac0>] __get_free_pages+0xe/0x4d > > [<ffffffff802ae1b7>] __pollwait+0x5e/0xdf > > [<ffffffff8860f2b4>] :nvidia:nv_kern_poll+0x2e/0x73 > > [<ffffffff802ad949>] do_select+0x308/0x506 > > [<ffffffff802adced>] core_sys_select+0x1a6/0x254 > > [<ffffffff802ae0b7>] sys_select+0xb5/0x157 > > That isn't a hang. When the bread() completes, everything proceeds. Yes sorry bad wording (from our bug report). It isn't a hang, just realy bad latency and everything stops for a while. > > Now I think the main problem is having the filesystem block (and do IO > > in inode reclaim. The problem is that this doesn't get accounted well > > and penalizes a random allocator with a big latency spike caused by > > work generated from elsewhere. > > Yes. Why does UDF do all that stuff in ->clear_inode()? Other > filesystems have very simple, non-blocking, non-IO-doing > ->clear_inode() implementations. This sounds like a design problem > within UDF. Yep. > > I think the best idea would be to avoid this. By design if possible, > > or by deferring the hard work to an asynchronous context. If the latter, > > then the fs would probably want to throttle creation of new work with > > queue size of the deferred work, but let's not get into those details. > > > > Anyway, another obvious thing we looked at is the iprune_mutex which > > is causing the cascading blocking. We could turn this into an rwsem to > > improve concurrency. It is unreasonable to totally ban all potentially > > slow or blocking operations in inode reclaim, so I think this is a cheap > > way to get a small improvement. > > > > This doesn't solve the whole problem of course. The process doing inode > > reclaim will still take the latency hit, and concurrent processes may > > end up contending on filesystem locks. So fs developers should keep > > these problems in mind please (or discuss alternatives). > > > > Jan points out this has the potential to uncover concurrency bugs in fs > > code. > > > > Comments? > > I bet you found that nice comment over iprune_mutex to be useful, no? > > That comment needs updating by this patch, btw. Yes will do. > > - mutex_unlock(&iprune_mutex); > > + up_write(&iprune_sem); > > yup, the patch looks OK. inode_lock protects the lists and each thread > will make each inode ineligible for lookup by other threads, so it's > hard to see how there could be races in the VFS code. I guess dispose_list does a bit of stuff unserialised now. However most of the same stuff gets done in inode deletion as well, so I think it is pretty low risk. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html