On Tue, Mar 13, 2018 at 10:05:57AM -0500, Jayashree Mohan wrote: > Hi, > Thanks for the quick reply. > > >> We've encountered what seems to be a crash consistency bug in > >> ext4(kernel 4.15) due to the interaction between delayed allocated > >> write and an unaligned fallocate(zero range). Say we create a disk > >> image with known data and quick format it. > >> 1. Now write 65K of data to a new file > >> 2. Zero out a part of the above file using falloc_zero_range (60K+128) > >> - (60K+128+4096) - an unaligned block > >> 3. fsync the above file > >> 4. Crash > >> > >> If we crash after the fsync, and allow reordering of the block IOs > >> between two flush/fua commands using Crashmonkey[1], then we can end > >> up zeroing the file range from (64K+128) to 65K, which should be > >> untouched by the fallocate command. We expect this region to contain > >> the user written data in step 1 above. > >> > >> This workload was inspired from xfstest/generic_042, which tests for > >> stale data exposure using aligned fallocate commands. It's worth > >> noting that f2fs and btrfs passes our test clean - irrespective of the > >> order of bios, user data is intact in these filesystems. > >> > >> To reproduce this bug using CrashMonkey, simply run : > >> ./c_harness -f /dev/sda -d /dev/cow_ram0 -t ext4 -e 10240 -s 1000 -v > >> tests/generic_042/generic_042_fzero_unaligned.so > > > > Hmm, I do not seem to be able to reproduce this problem. However I am > > running in a virtual environment with Virtio disk so that might be the > > problem ? Sorry if I am missing something it's my first time trying > > crashmonkey. > > By not being able to reproduce the problem, do you mean CrashMonkey > runs to completion and produces a summary block like this one, but > with all tests passed cleanly ? > > Reordering tests ran 1000 tests with > passed cleanly: 936 > passed fixed: 0 > fsck required: 0 > failed: 64 > old file persisted: 0 > file missing: 0 > file data corrupted: 64 > file metadata corrupted: 0 > incorrect block count: 0 > other: 0 > > If not could you tell me what the output is ? > We also run on a virtual environment - kvm or VirtualBox, so there > shouldn't be an issue with that. Ah, ok. The output tricked me, you're right it does fail for me in the same way. > > > > Also it's not yet clear to me we can zeroout the entire block instead of > > just a part of it because of the crash ? Unless it was actually zero > > before we wrote to it, so isn't it lost write rather than zeroout ? > > Before we start the workload, we run a setup phase to fill up the > entire disk by writing known data(non zero) to a file and unlink the > file. Because, when we run the actual workload, we want to be reusing > these data blocks. So I am wondering the only way the block could be > zeroed out is due to the fzero command (because if the write was lost, > we should see stale data corresponding to the initial setup phase? ) Ok, good to know. Thanks! -Lukas > > > > I think that comments from Dave are valid here as well I am not > > necessarily sure how this situation can happen anyway. So maybe we do > > have a bug there somewehere. I guess I'll know more once I am able to > > reproduce. > > > Thanks, > Jayashree