On Mon, May 29, 2023 at 10:14:51PM +0100, Matthew Wilcox wrote: > On Mon, May 29, 2023 at 04:59:40PM -0400, Mikulas Patocka wrote: > > Hi > > > > I improved the dm-flakey device mapper target, so that it can do random > > corruption of read and write bios - I uploaded it here: > > https://people.redhat.com/~mpatocka/testcases/bcachefs/dm-flakey.c > > > > I set up dm-flakey, so that it corrupts 10% of read bios and 10% of write > > bios with this command: > > dmsetup create flakey --table "0 `blockdev --getsize /dev/ram0` flakey /dev/ram0 0 0 1 4 random_write_corrupt 100000000 random_read_corrupt 100000000" > > I'm not suggesting that any of the bugs you've found are invalid, but 10% > seems really high. Is it reasonable to expect any filesystem to cope > with that level of broken hardware? Can any of our existing ones cope > with that level of flakiness? I mean, I've got some pretty shoddy USB > cables, but ... It's realistic in that when you have lots of individual storage devices, load balanced over all of them, and then one fails completely we'll see an IO error rate like this. These are the sorts of setups I'd expect to be using erasure coding with bcachefs, so the IO failure rate should be able to head towards 20-30% before actual loss and/or corruption should start occurring. In this situation, if the failures were isolated to an individual device, then I'd want the filesystem to kick that device out of the backing pool. Hence all the failures go away and then rebuild of the redundancy the erasure coding provides can take place. i.e. an IO failure rate this high should be a very short lived incident for a filesystem that directly manages individual devices. But within a single, small device, it's not a particularly realistic scenario. If it's really corrupting this much active metadata, then the filesystem should be shutting down at the first uncorrectable/unrecoverable metadata error and every other IO error is then superfluous. Of course, bcachefs might be doing just that - cleanly shutting down an active filesystem is a very hard problem. XFS still has intricate and subtle issues with shutdown of active filesystems that can cause hangs and/or crashes, so I wouldn't expect bcachefs to be able to handle these scenarios completely cleanly at this stage of it's development.... Perhaps it is worthwhile running the same tests on btrfs so we can something to compare the bcachefs behaviour to. I suspect that btrfs will fair little better on the single device, no checksums corruption test.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx