reproducible corruption in journal

"Seamus Connor" <seamus@xxxxxxxxxxxxxxxxx> · Tue, 23 Feb 2021 16:41:20 -0800

Hello All,

I am investigating an issue on our system where a filesystem is becoming corrupt. The underlying block device is provided by us, so we were suspicious that we were screwing something up. However, after gathering a repro of the issue with write logging (a slightly modified version of dm-log-writes), it looks like jbd2 is actually the one running into problems. Up to entry 2979 in our (dm-log-writes) log file, things seem OK:

    $ replay-log --log write_log --replay base_filesystem.ext4.tmp --limit 2979
    $ debugfs base_filesystem.ext4.tmp -f <(echo logdump) | tail
    debugfs 1.44.1 (24-Mar-2018)
    Found expected sequence 12821, type 1 (descriptor block) at block 1791
    Found expected sequence 12821, type 2 (commit block) at block 1795
    Found expected sequence 12822, type 1 (descriptor block) at block 1796
    Found expected sequence 12822, type 2 (commit block) at block 1802
    Found expected sequence 12823, type 1 (descriptor block) at block 1803
    Found expected sequence 12823, type 2 (commit block) at block 1810
    Found expected sequence 12824, type 1 (descriptor block) at block 1811
    Found expected sequence 12824, type 2 (commit block) at block 1815
    Found expected sequence 12825, type 1 (descriptor block) at block 1816
    No magic number at block 1821: end of journal.

However once we replay the next IO things get a little weirder. 

    $ ...--limit 2980 ...
    Found expected sequence 12825, type 2 (commit block) at block 1821
    Found sequence 12824 (not 12826) at block 1822: end of journal.

And finally after replaying the next IO, it looks like a bunch of old entries in the journal get resurrected:

    $ ...--limit 2981 ...
    Found expected sequence 15454, type 1 (descriptor block) at block 15352
    Found expected sequence 15454, type 2 (commit block) at block 15356

If we fsck our filesystem image at IO 2981 or later, we get a bunch of errors. Before that it is clean. IO 2981 touches as single 4k block in inode 8's data blocks. When this corruption was produced, the filesystem is mounted rw,relatime,sync.

This issue is reproduced by our workload, which is a smallish volume of writes every 30 seconds or so. Once the files are written, they are never modified, and are eventually deleted in a rotation style. During a torture test, we disable IO to the underlying block device, and before re-enabling it we fsck the contents of the device. It is here that we detect the corruption. We are detecting a corruption in around 1% of the events here.

Our kernel is based on Ubuntu 16.04's 4.4 series kernel. We have many patches on top of this, but none inside of mm and fs, and none in block that should be relevant. The issue also reproduces on Linux 5.4 + our patch series, though I have not studied any reproductions there.

My next step is to start going though the jbd2 code to figure out how it could be corrupted. Any assistance would be appreciated as I am mostly unfamiliar with this code. Does any one have some pointers at what to look at, or any recollection of similar issues? I can extract more information from the logs if it would help. I am also happy to share the images of the filesystem and the write log if someone wants to look at it.

Thanks,
Seamus