Re: ext4 fsck vs. kernel recovery policy

dann frazier <dann.frazier@xxxxxxxxxxxxx> · Thu, 29 Aug 2019 16:53:48 -0600

On Tue, Aug 27, 2019 at 02:27:25PM -0600, Andreas Dilger wrote:
> On Aug 27, 2019, at 1:10 PM, dann frazier <dann.frazier@xxxxxxxxxxxxx> wrote:
> > 
> > hey,
> >  I'm curious if there's a policy about what types of unclean
> > shutdowns 'e2fsck -p' can recover, vs. what the kernel will
> > automatically recover on mount. We're seeing that unclean shutdowns w/
> > data=journal,journal_csum frequently result in invalid checksums that
> > causes the kernel to abort recovery, while 'e2fsck -p' resolves the
> > issue non-interactively.
> 
> The kernel journal recovery will only replay the journal blocks.  It
> doesn't do any check and repair of filesystem correctness.  During and
> after e2fsck replays the journal blocks it still does basic correctness
> checking, and if an error is found it will fall back to a full scan.

hey Andreas!

Here's a log to clarify what I'm seeing:

$ sudo mount /dev/nbd0 mnt
JBD2: Invalid checksum recovering data block 517634 in log
JBD2: Invalid checksum recovering data block 517633 in log
[...]
JBD2: Invalid checksum recovering data block 517004 in log
JBD2: Invalid checksum recovering data block 4915712 in log
JBD2: recovery failed
EXT4-fs (nbd0): error loading journal
mount: /tmp/mnt: can't read superblock on /dev/nbd0.
$ sudo e2fsck -p /dev/nbd0 
/dev/nbd0: recovering journal
JBD2: Invalid checksum recovering block 517732 in log
JBD2: Invalid checksum recovering block 517519 in log
[...]
JBD2: Invalid checksum recovering block 4915712 in log
Journal checksum error found in /dev/nbd0
/dev/nbd0: Clearing orphaned inode 128798 (uid=0, gid=0, mode=040600, size=4096)
/dev/nbd0: Clearing orphaned inode 514998 (uid=0, gid=0, mode=040600, size=4096)
[...]
/dev/nbd0: Clearing orphaned inode 774759 (uid=0, gid=0, mode=0100600, size=4096)
/dev/nbd0 was not cleanly unmounted, check forced.
/dev/nbd0: 2127984/2195456 files (0.0% non-contiguous), 2963178/8780544 blocks

So is it correct to say that the checksum errors were identifying
filesystem correctness issues, and therefore e2fsck was needed to
correct them?

> > Driver for this question is that some Ubuntu installs set fstab's
> > passno=0 for the root fs - which I'm told is based on the assumption
> > that both kernel & e2fsck -p have parity when it comes to automatic
> > recovery - that's obviously does not appear to be the case - but I
> > wanted to confirm whether or not that is by design.
> 
> The first thing to figure out is why there are errors with the journal
> blocks.  That can cause problems for both the kernel and e2fsck journal
> replay.
> 
> Using data=journal is not a common option, so it is likely that the
> issue relates to this.

You're probably right - this issue is very easy to reproduce w/
data=journal,journal_checksum. I was never able to reproduce it
otherwise.

> IMHO, using data=journal could be helpful for
> small file writes and/or sync IO, but there have been discussions lately
> about removing this functionality.  If you have some use case that shows
> real improvements with data=journal, please let us know.

I don't have such a use case myself. The issue was reported by a user,
and it got me wondering about the basis for our passno=0 default.

  -dann