On Thu, Nov 17, 2022 at 10:59:11AM -0800, Darrick J. Wong wrote: > On Thu, Nov 17, 2022 at 04:58:09PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> [snip code, I'm on PTO for the next coupleof days so, just a quick process answer here...] > So the next question is -- how should we regression-test the > revalidation schemes in the write and writeback paths? Do you have > something ready to go that supersedes what I built in patches 13-16 of > https://lore.kernel.org/linux-xfs/166801781760.3992140.10078383339454429922.stgit@magnolia/T/#u Short answer is no. Longer answer is that I haven't needed to write new tests to exercise the code added to fix the bug. I've found that g/346 stresses the IOMAP_F_STALE path quite well because it mixes racing unaligned sub-folio write() calls with mmap write faults, often to the same folio. It's similar in nature to the original reproducer in that it does racing concurrent ascending offset unaligned sub-block writes to a single file. g/346 repeatedly found data corruptions (it's a data integrity test) as a result of the dellalloc punch code doing the wrong thing with 1kB block size, as well as with 4kB block size when the mmap page faults instatiated multi-page folios.... g269 and g/270 also seem to trigger IOMAP_F_STALE conditions quite frequently - streaming writes at ENOSPC trigger with fsstress running in the background executing sync() operations means writeback is racing with the streaming writes all the time. These tests exposed bugs that caused stale delalloc blocks to be left behind by the delalloc punch code. fsx also tripped over a couple of corruptions, too, when being run with buffered writes. Because fsx is single threaded, this implies that it was writeback that was triggering the IOMAP_F_STALE write() invalidations.... So from a "exercise the IOMAP_F_STALE write() case causing iomap invalidation, delalloc punching and continuing to complete the rest of the write", I think we've got a fair bunch of existing tests that cover both the "racing mmap dirties data in the punch range" and the "writeback/racing mmap triggers extent changes so triggers IOMAP_F_STALE" cases. As for the specific data corruption reproducer, I haven't done anything other than run the original regression test. I've been using it as, well, a regression test. I haven't had a chance to look at any of the other variants that have been written, because all the actual development was done running "-g rw -g enospc" on 1kB block size filesystems and repeatedly running g/346 and g/270 until they passed tens of iterations in a row. I only ran the original regression test to confirm that I hadn't broken the fix whilsts getting all the fstests to pass.... > Please let me know what you're thinking. I'll look at the other tests next week. Until then, I can't really comment on them. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx