On Thu, Sep 21, 2023 at 12:18:13AM -0700, Luis Chamberlain wrote: > On Thu, Sep 21, 2023 at 04:03:56PM +1000, Dave Chinner wrote: > > On Wed, Sep 20, 2023 at 09:57:56PM -0700, Luis Chamberlain wrote: > > > On Wed, Sep 20, 2023 at 08:00:12PM -0700, Luis Chamberlain wrote: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=large-block-linus > > > > > > > > I haven't tested yet the second branch I pushed though but it applied without any changes > > > > so it should be good (usual famous last words). > > > > > > I have run some preliminary tests on that branch as well above using fsx > > > with larger LBA formats running them all on the *same* system at the > > > same time. Kernel is happy. > > <-- snip --> > > > So I just pulled this, built it and run generic/091 as the very > > first test on this: > > > > # ./run_check.sh --mkfs-opts "-m rmapbt=1 -b size=64k" --run-opts "-s xfs_64k generic/091" > > The cover letter for this patch series acknowledged failures in fstests. But this is a new update, which you said fixed various issues, and you posted this in direct response to the bug report I gave you. > For kdevops now, we borrow the same last linux-next baseline: > > git grep "generic/091" workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev > workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_1024.txt:generic/091 # possible regression > workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_16k.txt:generic/091 > workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_32k.txt:generic/091 > workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_64k_4ks.txt:generic/091 > > So well, we already know this fails. *cough* -You- know it already fails. And you are expecting people who try the code to somehow know that you've explicitly ignored this fsx failure, especially after all your words to tell us how much fsx testing it has passed? And that's kinda my point - you're effusing about how much fsx testing this has passed, yet it istill fails after just a handful of ops in generic/091. The dissonance could break windows... ---- Fundamentally, when it comes to data integrity, it important to exercise as much of the operational application space as quickly as possible as it is that breadth of variation in operations that flushes out more bugs and helps stabilises the code faster. Why do you think we talk about the massive test matrix most filesytsems have and how long it takes to iterate so much? It's because iterating that complex test matrix is how we find all the whacky, weird bugs in the code. Concentrating on a single test configuration and running it over and over again won't find bugs in code it doesn't exercise no matter how long it is run for. Running such a setup in an automated environment doesn't mean you get better code coverage, it just means you cover the same narrow set of corner cases faster and more times. If it works once, it should work a million times. Iterating it a billion more times doesn't tell us anything additional, either. Put simply: performing deep, homogenous testing on code that has known data corruption bugs outside the narrow scope of the test case is not telling us anything useful about the overall state of the code. Indeed, turning off failing tests that are critical to validating the correct operation of the code you are modifying is bad practice. For code changes like this, all fsx testing in fstests should pass before you post anything for review - even for an RFC. There is no point reviewing code that doesn't work properly, nor wasting people's time by encouraging them to test it when it's clear to you that it's going to fail in various important ways. Hence I think your testing is focussing on the wrong things and I suspect that you've misunderstood the statements of "we'll need billions of fsx ops to test this code" that various people have made really meant. You've elevated running billions of fsx ops to your primary "it works" gating condition, at the expense of making sure all the other parts of the filesystem still work correctly. The reality is that the returns from fsx diminish as the number of ops go up. Once you've run the first hundred million fsx ops for a given operations set, the chance that the next 100M ops will find a new problem is -greatly- reduced. The vast majority of problems will be found in the first 10M ops that are run in any given fsx operation, and few bugs are found beyond the 100M mark. Yes, we occasionally find one up in the billions, but that's rare and most definitely not somethign to focus on when still developing RFC level code. Different fsx configurations change the operation set that is run - mixing DIO reads with buffered writes, turning mmap on and off, using AIO or io_uring rather than synchronous IO, etc. These all exercise different code paths and corner cases and have vastly different code interactions, and that is what we need to cover when developing new code. IOWs, we need coverage of the *entire operation space*, not just the same narrow set of operations run billions of time. A wide focus requires billions of ops to cover because it requires lots of different application configurations to be run. In constrast, there are only three fs configurations that matter: bs < PS, bs == PS and bs > PS. For example, 16kB, 32kB and 64kB filesystem configs exercise exactly the same code paths in exactly the same way (e.g. both have non-zero miniumum folio orders but only differ by what that order is). Hence running the same test application configs on these different filessytem configurations does actually not improve code coverage of the testing at all. Testing all of them only increases the resources required to the test a change, it does not improve the quality of coverage of the testing being performed at all.... Hence I'd strongly suggest that, for the next posting of these cahnge, you focus on making fstests pass without turning off any failing tests, and that fsx is run with a wide variety of configurations (e.g. modify all the fstests cases to run for a configurable number of ops (e.g. via SOAK_DURATION)). We just don't care at this point about finding that 1 in 10^15 ops bug because it's code in development; what we actually care about is that -everything- works correctly for the vast majority of use cases.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx