On Tue, Sep 10, 2024 at 10:19:50AM -0400, Brian Foster wrote: > No real issue with the test, but I wonder if we could do something more > generic. Various XFS shutdown and log recovery issues went undetected > for a while until we started adding more of the generic stress tests > currently categorized in the recoveryloop group. > > So for example, I'm wondering if you took something like generic/388 or > 475 and modified it to start with a smallish fs, grew it in 1GB or > whatever increments on each loop iteration, and then ran the same > generic stress/timeout/shutdown/recovery sequence, would that eventually > reproduce the issue you've fixed? I don't think reproducibility would > need to be 100% for the test to be useful, fwiw. > > Note that I'm assuming we don't have something like that already. I see > growfs and shutdown tests in tests/xfs/group.list, but nothing in both > groups and I haven't looked through the individual tests. Just a > thought. It turns out reproducing this bug was surprisingly complicated. After a growfs we can now dip into reserves that made the test1 file start filling up the existing AGs first for a while, and thus the error injection would hit on that and never even reach a new AG. So while agree with your sentiment and like the highlevel idea, I suspect it will need a fair amount of work to actually be useful. Right now I'm too busy with various projects to look into it unfortunately.