On Sat, Apr 21, 2012 at 10:42:56AM +1000, Dave Chinner wrote: > On Fri, Apr 20, 2012 at 01:19:46PM -0500, Mark Tinguely wrote: > > On 03/27/12 11:44, Christoph Hellwig wrote: > > >Queue delwri buffers on a local on-stack list instead of a per-buftarg one, > > >and write back the buffers per-process instead of by waking up xfsbufd. ..... > > >Given that xfsailds now also handles writing out buffers the conditions for > > >log forcing and the sleep times needed some small changes. The most > > >important one is that we consider an AIL busy as long we still have buffers > > >to push, and the other one is that we do increment the pushed LSN for > > >buffers that are under flushing at this moment, but still count them towards > > >the stuck items for restart purposes. Without this we could hammer on stuck > > >items without ever forcing the log and not make progress under heavy random > > >delete workloads on fast flash storage devices. > > > > > >Signed-off-by: Christoph Hellwig<hch@xxxxxx> > > > > Test 106 runs to completion with patch 06. > > > > Patch 07 and 08 do not compile without patch 09. > > > > Starting with patch 09, I get the following hang on every test 106: > > FYI, test 106 is not in the auto group, which means it typically > isn't run on regression test runs by anyone who isn't modifying > quota code. That'll be why nobody else is seeing this. And I can reporduce it easily enough w/ 106. > > ID: 27992 TASK: ffff8808310d00c0 CPU: 2 COMMAND: "mount" > > #0 [ffff880834237938] __schedule at ffffffff81417200 > > #1 [ffff880834237a80] schedule at ffffffff81417574 > > #2 [ffff880834237a90] schedule_timeout at ffffffff81415805 > > #3 [ffff880834237b30] wait_for_common at ffffffff81416a67 > > #4 [ffff880834237bc0] wait_for_completion at ffffffff81416bd8 > > #5 [ffff880834237bd0] xfs_buf_iowait at ffffffffa04fc5a5 [xfs] > > #6 [ffff880834237c00] xfs_buf_delwri_submit at ffffffffa04fe4b9 [xfs] > > #7 [ffff880834237c40] xfs_qm_quotacheck at ffffffffa055cb2d [xfs] > > #8 [ffff880834237cc0] xfs_qm_mount_quotas at ffffffffa055cdf0 [xfs] > > #9 [ffff880834237cf0] xfs_mountfs at ffffffffa054c041 [xfs] > > #10 [ffff880834237d40] xfs_fs_fill_super at ffffffffa050ca80 [xfs] > > #11 [ffff880834237d70] mount_bdev at ffffffff81150c5c > > #12 [ffff880834237de0] xfs_fs_mount at ffffffffa050ac00 [xfs] > > #13 [ffff880834237df0] mount_fs at ffffffff811505f8 > > #14 [ffff880834237e40] vfs_kern_mount at ffffffff8116c070 > > #15 [ffff880834237e80] do_kern_mount at ffffffff8116c16e > > #16 [ffff880834237ec0] do_mount at ffffffff8116d6f0 > > #17 [ffff880834237f20] sys_mount at ffffffff8116d7f3 > > #18 [ffff880834237f80] system_call_fastpath at ffffffff814203b9 > > And event trace is going to be the only way to find out why it is > still waiting..... Interesting. The buffer that we have hung on looks strange: xfs_buf_read: dev 253:16 bno 0x4d8 len 0x1000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK caller xfs_buf_readahead xfs_buf_iorequest: dev 253:16 bno 0x4d8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|PAGES caller _xfs_buf_read What's really strange about that is the flags. The first trace is the IO request type trace - the flags indicate the type of IO to be done, and here is it clearly readahead. The second is the IO dispatch, where the flags come from the *buffer*, and according to the xfs_buf_alloc code: /* * We don't want certain flags to appear in b_flags. */ flags &= ~(XBF_LOCK|XBF_MAPPED|XBF_DONT_BLOCK|XBF_READ_AHEAD); And according to the xfs_buf.h code: { XBF_LOCK, "LOCK" }, /* should never be set */\ { XBF_TRYLOCK, "TRYLOCK" }, /* ditto */\ { XBF_DONT_BLOCK, "DONT_BLOCK" }, /* ditto */\ So clearly these don't all match up. We're letting the trylock flag slip through, but this should be harmless. However, the XBF_READ_AHEAD flag is clearly showing up, but it is supposed to be there across the IO (set in _xfs_buf_read, cleared in xfs_buf_ioend) so we can ignore that. The trylock is slipping through because it is not blocked like it should be xfs_buf_alloc(), but it is otherwise unused so that isn't an issue. That leaves the only possible source of the problem being the ASYNC flag. We don't clear the async flag during IO completion because it is used throughout IO completion to determine exactly what to do, and completion has no idea if it is the final completion or not, and hence we can't sanely clear it. hence it has to be cleared on IO submission. That's the bug. In __xfs_buf_delwri_submit(), the buffer submit does: if (!wait) { bp->b_flags |= XBF_ASYNC; list_del_init(&bp->b_list); } whereas teh old xfs_flush_buftarg() code does: if (wait) { bp->b_flags &= ~XBF_ASYNC; list_add(&bp->b_list, &wait_list); } Subtle difference, and one that I missed on review. That is, the old code assumes that the XBF_ASYNC flag is already set because it is, in fact, set in xfs_buf_delwri_queue(). bp->b_flags |= XBF_DELWRI | _XBF_DELWRI_Q | XBF_ASYNC; That is, for a buffer to be on the delwri queue, it has to have XBF_ASYNC set on it. Hence for a blocking flush, we need to clear that flag for IO completion to send wakeups. The new code only does this in xfs_buf_delwri_queue() now: bp->b_flags |= _XBF_DELWRI_Q; so the flushing code can make no assumption about the state of XBF_ASYNC flag at all. Hence the new code in __xfs_buf_delwri_submit() must clear the XBF_ASYNC flag to get it into a known state for both blocking and async flushes: list_for_each_entry_safe(bp, n, submit_list, b_list) { - bp->b_flags &= ~_XBF_DELWRI_Q; + bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC); bp->b_flags |= XBF_WRITE; if (!wait) { bp->b_flags |= XBF_ASYNC; list_del_init(&bp->b_list); } xfs_bdstrat_cb(bp); } I'll update the patch in my stack to make this change, and write a new patch on top of my buffer flag cleanups to clean up the flag exclusion code (because XBF_LOCK is already gone!). With the above fix, 106 runs to completion, but it still fails because the golden output is full of unfiltered numbers and irrelevant stuff. I'd say that's why the test is not in the auto group - it needs work to give reliable, machine independent output... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs