Re: Crash consistency bug in ext4 - interaction between delalloc and fzero

Lukas Czerner <lczerner@xxxxxxxxxx> · Tue, 13 Mar 2018 13:23:13 +0100

On Mon, Mar 12, 2018 at 08:50:02PM -0500, Jayashree Mohan wrote:
> Hi,

Hi,

thanks for the report.

> 
> We've encountered what seems to be a crash consistency bug in
> ext4(kernel 4.15) due to the interaction between delayed allocated
> write and an unaligned fallocate(zero range). Say we create a disk
> image with known data and quick format it.
> 1. Now write 65K of data to a new file
> 2. Zero out a part of the above file using falloc_zero_range (60K+128)
> - (60K+128+4096) - an unaligned block
> 3. fsync the above file
> 4. Crash
> 
> If we crash after the fsync, and allow reordering of the block IOs
> between two flush/fua commands using Crashmonkey[1], then we can end
> up zeroing the file range from (64K+128) to 65K, which should be
> untouched by the fallocate command. We expect this region to contain
> the  user written data in step 1 above.
> 
> This workload was inspired from xfstest/generic_042, which tests for
> stale data exposure using aligned fallocate commands. It's worth
> noting that f2fs and btrfs passes our test clean - irrespective of the
> order of bios, user data is intact in these filesystems.
> 
> To reproduce this bug using CrashMonkey, simply run :
> ./c_harness -f /dev/sda -d /dev/cow_ram0 -t ext4 -e 10240 -s 1000 -v
> tests/generic_042/generic_042_fzero_unaligned.so

Hmm, I do not seem to be able to reproduce this problem. However I am
running in a virtual environment with Virtio disk so that might be the
problem ? Sorry if I am missing something it's my first time trying
crashmonkey.

Also it's not yet clear to me we can zeroout the entire block instead of
just a part of it because of the crash ? Unless it was actually zero
before we wrote to it, so isn't it lost write rather than zeroout ?

I think that comments from Dave are valid here as well I am not
necessarily sure how this situation can happen anyway. So maybe we do
have a bug there somewehere. I guess I'll know more once I am able to
reproduce.

Thanks!
-Lukas

> 
> and take a look at the <timestamp>-generic_042_fzero_unaligned.log
> created in the build directory. This file has the list of block IOs
> issued during the workload and the permutation of bios that lead to
> this bug. You can also verify using blktrace that CrashMonkey only
> reorders bios between two barrier operations(thereby such a crash
> state could be encountered due to reordering blocks at the storage
> stack). Note that tools like dm-log-writes cannot capture this bug
> because this arises due to reordering blocks between barrier
> operations.
> 
> This seems to a bug, as it is zeroing out user data that is ideally
> not supposed to be zeroed by the fallocate command.
> Let me know if I am missing some detail here.
> 
> [1] https://github.com/utsaslab/crashmonkey.git
> 
> Thanks,
> Jayashree Mohan