Re: [Bug 202053] [xfstests generic/464]: XFS corruption and Assertion failed: 0, file: fs/xfs/xfs_super.c, line: 985

Brian Foster <bfoster@xxxxxxxxxx> · Tue, 8 Jan 2019 09:57:03 -0500

On Tue, Jan 08, 2019 at 04:55:18PM +1100, Dave Chinner wrote:
> On Mon, Jan 07, 2019 at 02:11:01PM -0500, Brian Foster wrote:
> > On Mon, Jan 07, 2019 at 09:41:14AM -0500, Brian Foster wrote:
> > > On Mon, Jan 07, 2019 at 08:57:37AM +1100, Dave Chinner wrote:
> > > For example, I'm concerned that something like sustained buffered writes
> > > could completely break the writeback imap cache by continuously
> > > invalidating it. I think speculative preallocation should help with this
> > > in the common case by already spreading those writes over fewer
> > > allocations, but do we care enough about the case where preallocation
> > > might be turned down/off to try and restrict where we bump the sequence
> > > number (to > i_size changes, for example)? Maybe it's not worth the
> > > trouble just to optimize out a shared ilock cycle and lookup, since the
> > > extent list is still in-core after all.
> > > 
> > 
> > A follow up FWIW... a quick test of some changes to reuse the existing
> > mechanism doesn't appear to show much of a problem in this regard, even
> > with allocsize=4k. I think another thing that minimizes impact is that
> > even if we end up revalidating the same imap over and over, the ioend
> > construction logic is distinct and based on contiguity. IOW, writeback
> > is still sending the same sized I/Os for contiguous blocks...
> 
> Ah, I think you discovered that the delay between write(),
> ->writepages() and the incoming write throttling in
> balance_dirty_pages() creates a large enough dirty page window that
> we avoid lock-stepping write and writepage in a determental way....
> 

That certainly may be another factor, but note that I am able to
reproduce a significant number of spurious invalidations and thus a
significant increase in imap lookups (in the allocsize=4k case). Taking
another look at some trace data, I also see sequences of xfs_iomap_alloc
and xfs_map_blocks_found events in lockstep, though those sequences
aren't terribly long and aren't sustained (which perhaps may be to your
point wrt to buffered write throttling).

I think the larger point is that the following factors altogether:

- buffered write throttling (via dirty pages)
- speculative preallocation in XFS throttling fork changes in the common
  sequential write (file copy) use case
- cached map lookup being fairly cheap (i.e., no extra I/Os) relative to
  I/O
- spurious cached map revals not affecting I/O sizes

... mean that this isn't some blatant problem that makes global
invalidation as such a nonstarter for common usage patterns. I wouldn't
rule out other problems precisely because the spurious invals are there,
as noted above, but so far it seems it's not worth worrying about or
overcomplicating things unless/until we find a use case and workload
noticeably affected by it.

Brian

> AFAICT, the only time we have to worry about this is if we are so
> short of memory the kernel is cleaning every page as soon as it is
> dirtied. If we get into that situation, invalidating the cached map
> is the least of our worries :P
> 
> Cheers,
> 
> dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx