Re: [PATCH 8/9] xfs: use iomap_valid method to detect stale cached iomaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 17, 2022 at 10:59:11AM -0800, Darrick J. Wong wrote:
> On Thu, Nov 17, 2022 at 04:58:09PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>

[snip code, I'm on PTO for the next coupleof days so, just a quick
process answer here...]

> So the next question is -- how should we regression-test the
> revalidation schemes in the write and writeback paths?  Do you have
> something ready to go that supersedes what I built in patches 13-16 of
> https://lore.kernel.org/linux-xfs/166801781760.3992140.10078383339454429922.stgit@magnolia/T/#u

Short answer is no.

Longer answer is that I haven't needed to write new tests to
exercise the code added to fix the bug.

I've found that g/346 stresses the IOMAP_F_STALE path quite well
because it mixes racing unaligned sub-folio write() calls with mmap
write faults, often to the same folio. It's similar in nature to the
original reproducer in that it does racing concurrent ascending
offset unaligned sub-block writes to a single file. 

g/346 repeatedly found data corruptions (it's a data integrity test)
as a result of the dellalloc punch code doing the wrong thing with
1kB block size, as well as with 4kB block size when the mmap page
faults instatiated multi-page folios....

g269 and g/270 also seem to trigger IOMAP_F_STALE conditions quite
frequently - streaming writes at ENOSPC trigger with fsstress
running in the background executing sync() operations means
writeback is racing with the streaming writes all the time. These
tests exposed bugs that caused stale delalloc blocks to be left
behind by the delalloc punch code.

fsx also tripped over a couple of corruptions, too, when being
run with buffered writes. Because fsx is single threaded, this
implies that it was writeback that was triggering the IOMAP_F_STALE
write() invalidations....

So from a "exercise the IOMAP_F_STALE write() case causing iomap
invalidation, delalloc punching and continuing to complete the rest
of the write", I think we've got a fair bunch of existing tests that
cover both the "racing mmap dirties data in the punch range" and the
"writeback/racing mmap triggers extent changes so triggers
IOMAP_F_STALE" cases.

As for the specific data corruption reproducer, I haven't done
anything other than run the original regression test. I've been
using it as, well, a regression test. I haven't had a chance to look
at any of the other variants that have been written, because all the
actual development was done running "-g rw -g enospc" on 1kB block
size filesystems and repeatedly running g/346 and g/270 until they
passed tens of iterations in a row. I only ran the original
regression test to confirm that I hadn't broken the fix whilsts
getting all the fstests to pass....

> Please let me know what you're thinking.

I'll look at the other tests next week. Until then, I can't really
comment on them.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux