On Mon, Feb 03, 2025 at 10:55:19AM -0800, Boris Burkov wrote: > At Meta, we currently primarily rely on fstests 'auto' runs for > validating Btrfs as a general purpose filesystem for all of our root > drives. While this has obviously proven to be a very useful test suite > with rich collaboration across teams and filesystems, we have observed a > recent trend in our production filesystem issues that makes us question > if it is sufficient. > > Over the last few years, we have had a number of issues (primarily in > Btrfs, but at least one notable one in Xfs) that have been detected in > production, then reproduced with an unreliable non-specific stressor > that takes hours or even days to trigger the issue. > Examples: > - Btrfs relocation bugs > https://lore.kernel.org/linux-btrfs/68766e66ed15ca2e7550585ed09434249db912a2.1727212293.git.josef@xxxxxxxxxxxxxx/ > https://lore.kernel.org/linux-btrfs/fc61fb63e534111f5837c204ec341c876637af69.1731513908.git.josef@xxxxxxxxxxxxxx/ > - Btrfs extent map merging corruption > https://lore.kernel.org/linux-btrfs/9b98ba80e2cf32f6fb3b15dae9ee92507a9d59c7.1729537596.git.boris@xxxxxx/ > - Btrfs dio data corruptions from bio splitting > (mostly our internal errors trying to make minimal backports of > https://lore.kernel.org/linux-btrfs/cover.1679512207.git.boris@xxxxxx/ > and Christoph's related series) > - Xfs large folios > https://lore.kernel.org/linux-fsdevel/effc0ec7-cf9d-44dc-aee5-563942242522@xxxxxxxx/ > > In my view, the common threads between these are that: > - we used fstests to validate these systems, in some cases even with > specific regression tests for highly related bugs, but still missed > the bugs until they hit us during our production release process. In > all cases, we had passing 'fstests -g auto' runs. > - were able to reproduce the bugs with a predictable concoction of "run > a workload and some known nasty btrfs operations in parallel". The most > common form of this was running 'fsstress' and 'btrfs balance', but it > wasn't quite universal. Sometimes we needed reflink threads, or > drop_caches, or memory pressure, etc. to trigger a bug. > - The relatively generic stressing reproducers took hours or days to > produce an issue then the investigating engineer could try to tweak and > tune it by trial and error to bring that time down for a particular bug. > > This leads me to the conclusion that there is some room for improvement in > stress testing filesystems (at least Btrfs). > > I attempted to study the prior art on this and so far have found: > - fsstress/fsx and the attendant tests in fstests/. There are ~150-200 > tests using fsstress and fsx in fstests/. Most of them are xfs and > btrfs tests following the aforementioned pattern of racing fsstress > with some scary operations. Most of them tend to run for 30s, though > some are longer (and of course subject to TIME_FACTOR configuration) > - Similar duration error injection tests in fstests (e.g. generic/475) > - The NFSv4 Test Project > https://www.kernel.org/doc/ols/2006/ols2006v2-pages-275-294.pdf > A choice quote regarding stress testing: > "One year after we started using FSSTRESS (in April 2005) Linux NFSv4 > was able to sustain the concurrent load of 10 processes during 24 > hours, without any problem. Three months later, NFSv4 reached 72 hours > of stress under FSSTRESS, without any bugs. From this date, NFSv4 > filesystem tree manipulation is considered to be stable." > > > I would like to discuss: > - Am I missing other strategies people are employing? Apologies if there > are obvious ones, but I tried to hunt around for a few days :) At the moment I start six VMs per "configuration", which each run one of: generic/521 (directio) generic/522 (bufferedio) generic/476 (fsstress) generic/388 (fsstress + log recovery) xfs/285 (online fsck) xfs/286 (online metadata rebuild) with SOAK_DURATION=6.5d so that they wrap up right around the time that each rc release drops. I also set FSSTRESS_AVOID="-m 16" so that we don't end up with gigantic quota files. There are two "configurations" per kernel tree. The dot product of them are: djwong-dev: -m metadir=1,autofsck=1,uquota,gquota,pquota, -m metadir=1,autofsck=1,uquota,gquota,pquota, -d rtinherit=1, tot mainline: -m autofsck=1, -d rtinherit=1, -m autofsck=1, for-next: -m metadir=1,autofsck=1,uquota,gquota,pquota, -m metadir=1,autofsck=1,uquota,gquota,pquota, -d rtinherit=1, Actually, I just realized that with 6.14 I need to update the tot mainline configuration to have metadir=1. > - What is the universe of interesting stressors (e.g., reflink, scrub, > online repair, balance, etc.) Prodding djwong and everyone else into loading up fsx/fsstress with all their weird new file io calls. ;) > - What is the universe of interesting validation conditions (e.g., > kernel panic, read only fs, fsck failure, data integrity error, etc.) > - Is there any interest in automating longer running fsstress runs? Are > people already doing this with varying TIME_FACTOR configurations in > fstests? I don't run with SOAK_DURATION > 14 days because I generally haven't found larger values to be useful in finding bugs. However, these weekly long soak tests runs have been going since 2016. FWIW that actually started because we had a lot of customer complaints in that era about log recovery failures in xfs, and only later did I spread it beyond generic/388 to the six profiles above. > - There is relatively less testing with fsx than fsstress in fstests. > I believe this creates gaps for data corruption bugs rather than > "feature logic" issues that the fsstress feature set tends to hit. Probably. I wonder how much we're really flexing io_uring? --D > - Can we standardize on some modular "stressors" and stress durations > to run to validate file systems? > > In the short term, I have been working on these ideas in a separate > barebones stress testing framework which I am happy to share, but isn't > particularly interesting in and of itself. It is basically just a > skeleton for concurrently running some concurrent "stressors" and then > validating the fs with some generic "validators". I plan to run it > internally just to see if I can get some useful results on our next few > major kernel releases. > > And of course, I would love to discuss anything else of interest to > people who like stress testing filesystems! > > Thanks, > Boris >