Re: [PATCH 3/3] xfs, iomap: ->discard_folio() is broken so remove it

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 15, 2023 at 10:25:43AM -0500, Brian Foster wrote:
> On Wed, Feb 15, 2023 at 09:20:00AM +1100, Dave Chinner wrote:
> > On Tue, Feb 14, 2023 at 01:10:05PM -0500, Brian Foster wrote:
> > > On Tue, Feb 14, 2023 at 04:51:14PM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > 
> > > > Ever since commit e9c3a8e820ed ("iomap: don't invalidate folios
> > > > after writeback errors") XFS and iomap have been retaining dirty
> > > > folios in memory after a writeback error. XFS no longer invalidates
> > > > the folio, and iomap no longer clears the folio uptodate state.
> > > > 
> > > > However, iomap is still been calling ->discard_folio on error, and
> > > > XFS is still punching the delayed allocation range backing the dirty
> > > > folio.
> > > > 
> > > > This is incorrect behaviour. The folio remains dirty and up to date,
> > > > meaning that another writeback will be attempted in the near future.
> > > > THis means that XFS is still going to have to allocate space for it
> > > > during writeback, and that means it still needs to have a delayed
> > > > allocation reservation and extent backing the dirty folio.
> > > > 
> > > 
> > > Hmm.. I don't think that is correct. It looks like the previous patch
> > > removes the invalidation, but writeback clears the dirty bit before
> > > calling into the fs and we're not doing anything to redirty the folio,
> > > so there's no guarantee of subsequent writeback.
> > 
> > Ah, right, I got confused with iomap_do_writepage() which redirties
> > folios it performs no action on. The case that is being tripped here
> > is "count == 0" which means no action has actually been taken on the
> > folio and it is not submitted for writeback. We don't mark the folio
> > with an error on submission failure like we do for errors reported
> > to IO completion, so the folio is just left in it's current state
> > in the cache.
> > 
> > > Regardless, I can see how this prevents this sort of error in the
> > > scenario where writeback fails due to corruption, but I don't see how it
> > > doesn't just break error handling of writeback failures not associated
> > > with corruption.
> > 
> > What other cases in XFS do we have that cause mapping failure? We
> > can't get ENOSPC here because of delalloc reservations. We can't get
> > ENOMEM because all the memory allocations are blocking. That just
> > leaves IO errors reading metadata, or structure corruption when
> > parsing and modifying on-disk metadata.  I can't think (off the top
> > of my head) of any other type of error we can get returned from
> > allocation - what sort of non-corruption errors were you thinking
> > of here?
> > 
> > > fails due to some random/transient error, delalloc is left around on a
> > > !dirty page (i.e. stale), and reclaim eventually comes around and
> > > results in the usual block accounting corruption associated with stale
> > > delalloc blocks.
> > 
> > The first patches in the series fix those issues. If we get stray
> > delalloc extents on a healthy inode, then it will still trigger all
> > the warnings/asserts that we have now. But if the inode has been
> > marked sick by a corruption based allocation failure, it will clean
> > up in reclaim without leaking anything or throwing any new warnings.
> > 
> 
> Those warnings/asserts that exist now indicate something is wrong and
> that free space accounting is likely about to become corrupted, because
> an otherwise clean inode is being reclaimed with stale delalloc blocks.

Well, yes.

> I see there's an error injection knob (XFS_ERRTAG_REDUCE_MAX_IEXTENTS)
> tied to the max extent count checking stuff in the delalloc conversion
> path. You should be able to add some (10+) extents to a file and then
> turn that thing all the way up to induce a (delalloc conversion)
> writeback failure and see exactly what I'm talking about [1].
> 
> Brian
> 
> [1] The following occurs with this patch, but not on mainline because the
> purpose of ->discard_folio() is to prevent it.

A non-corruption related writeback error has resulted in those debug
checks triggering correctly. This demonstrates the debug checks are
still working as intended. :)

Hence this isn't an argument against removing ->discard_folio(), this is
merely a demonstration that the current patch series needs more work.

Indeed, if the folio gets redirtied here instead of left clean as
we've already talked about, a future writeback may, in fact, succeed
and this specific problem goes away. We know how this retry
mechanism works - it's exactly what we do with metadata write
failures. Further, changing the behaviour of failure handling here
is exactly what we have the configurable error handling
infrastructure for. It's also why the "fail on unmount"
functionality exists, too.

That is, if we get to the point that "fail on unmount" triggers for
metadata we cannot write back due to persistent errors, we should
also perform the same trigger for data we cannot write back due to
persistent writeback allocation failures. In which case, any
allocation error should mark the inode sick and the unconverted
delalloc extents get cleaned up correctly by the final inode reclaim
pass.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux