Re: [PATCH 4/7] xfs: buffered write failure should not truncate the page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 02, 2022 at 09:41:30AM -0700, Darrick J. Wong wrote:
> On Tue, Nov 01, 2022 at 11:34:09AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > xfs_buffered_write_iomap_end() currently invalidates the page cache
> > over the unused range of the delalloc extent it allocated. While the
> > write allocated the delalloc extent, it does not own it exclusively
> > as the write does not hold any locks that prevent either writeback
> > or mmap page faults from changing the state of either the page cache
> > or the extent state backing this range.
> > 
> > Whilst xfs_bmap_punch_delalloc_range() already handles races in
> > extent conversion - it will only punch out delalloc extents and it
> > ignores any other type of extent - the page cache truncate does not
> > discriminate between data written by this write or some other task.
> > As a result, truncating the page cache can result in data corruption
> > if the write races with mmap modifications to the file over the same
> > range.
> > 
> > generic/346 exercises this workload, and if we randomly fail writes
> > (as will happen when iomap gets stale iomap detection later in the
> > patchset), it will randomly corrupt the file data because it removes
> > data written by mmap() in the same page as the write() that failed.
> > 
> > Hence we do not want to punch out the page cache over the range of
> > the extent we failed to write to - what we actually need to do is
> > detect the ranges that have dirty data in cache over them and *not
> > punch them out*.
> 
> Same dumb question as hch -- why do we need to punch out the nondirty
> pagecache after a failed write?  If the folios are uptodate then we're
> evicting cache unnecessarily, and if they're !uptodate can't we let
> reclaim do the dirty work for us?

Sorry, we don't punch out the page cache  - this was badly worded. I
meant:

"[...] - what we actually need to do is
detect the ranges that have dirty data in cache over the delalloc
extent and retain those regions of the delalloc extent."

i.e. we never punch the page cache anymore, and we only selectively
punch the delalloc extent that back the clean regions of thw write
range...

> I don't know if there are hysterical raisins for this or if the goal is
> to undo memory consumption after a write failure?  If we're stale-ing
> the write because the iomapping changed, why not leave the folio where
> it is, refresh the iomapping, and come back to (possibly?) the same
> folio?

I can't say for certain - I haven't gone an looked at the history.
I suspect it goes back to the days where write() could write zeroes
into the page cache for eof zeroing or zeroing for file extension
before the write() started writing data. Maybe that was part of what
it was trying to undo?

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux