Re: couldn't mount because of unsupported optional features (477e7ad1e859f753)

pg@xxxxxxxxxxxxxxxxxxxxx (Peter Grandi) · Fri, 31 Dec 2021 11:43:54 +0000

[...]
>> use 'lscp /dev/...' to list the checkpoints and try to mount
>> an older checkpoint with 'mount -t nilfs2 -o cp=... /dev/...
>> ...' to mount it and resume work from that. In theory older
>> checkpoints will be fully consistent even if the latest one
>> is corrupted.

> Thanks Peter, it seems both lscp and mount -o cp need a
> functioning super block though.

If the superblock is gone, it is a rather unlucky situation. But
note that NILFS2 has got a redundant copy of the superblock like
most other filesystem types. This is described here:

  https://github.com/nilfs-dev/nilfs2-kmod7/blob/master/fs/nilfs2/the_nilfs.c#L490

This mailing list thread may be particularly relevant:

  https://www.mail-archive.com/linux-nilfs@xxxxxxxxxxxxxxx/msg01438.html
  https://www.mail-archive.com/linux-nilfs@xxxxxxxxxxxxxxx/msg01239.html
  https://www.mail-archive.com/linux-nilfs@xxxxxxxxxxxxxxx/msg01238.html

In my experience it never happened that NILFS2 corrupted a
superblock, so it is most likely an external cause.

> I might dig into this a little deeper, the data isn't that
> important but gaining a correct understanding of NILFS working
> principles is. My understanding so far was that it's quite
> hard for data to become entirely inaccessible.

The same for most other filesystem types, but for log structured
ones it is even harder. The NILFS2 idea is that since all
metadata blocks are checksummed, one can just roll back to a
checkpoint where all checksums work, and then the filesystem is
consistent up to that point. This does not protect against most
cases of data corruption or damage to the superblock or spread
damage to metadata (in the latter case it may be impossible to
find a sequence of segments with valid checksums).

NILFS2 has some interesting recovery logic here:

  https://github.com/nilfs-dev/nilfs2-kmod7/blob/master/fs/nilfs2/recovery.c

> This looks like a good idea, linear scan for segment nodes:
> https://www.spinics.net/lists/linux-nilfs/msg02198.html Could
> be the start of the fsck that never happened.

That is not quite an 'fsck' but a recovery tool; many 'fsck'
implementations also attempt to do a bit of recovery too, but
their primary function is to repair metadata in case of partial
writes, which because of the checksums mentioned above is not
necessary for NILFS2, and the same argument is used for ZFS,
which is "log based" or "log inspired".

I find the lack of 'fack' for NILFS2 and ZFS a mild issue:
whether or not a filesystem type needs a repair too, another
core function of 'fsck' is an auditing tool, to be run
periodically even if there are no known issues (ZFS "resilvering
is not a full audit). But then how many people nowadays run
regularly 'fsck' where it is available as an auditing tool even
if there are no known issues?

One of the most profound quotes in the history of information
engineering:

  "As far as we know, our computer has never had an undetected
  error" Conrad H. Weisert (Union Carbide Corporation) in
  "Datamation" (1969)