Hi,
I first sent this email to pg@xxxxxxxxxxxxxxxxxxxxx since it was the
email list I sent my previous emails to. However, I am unsure whether my
email really reached the email list.
Sorry it took me 1,5 years to respond to this (see below). I have used
another computer meanwhile so have not been stuck since then. :) I have
a daughter that just have started kindergarten so I have a bit more time
now. :)
See my answers below.
2017-07-31 15:20 skrev pg@xxxxxxxxxxxxxxxxxxxxx:
[ ... ]
But as far as I understand it is not possible to mount a
previous snapshot as writable if there are snapshots/checkpoints
after this snapshot. Since I only get a filesystem error when
mounting a snapshot writable,
That seems unlikely to me. After mounting read-only, check whether
the whole filetree can be accessed error-free, with something like
find $DIR -xdev -perm /07777 | wc -l
for metadata and then for data too:
tar -f /dev/zero -c --one $DIR
The metadata test worked well, i.e. without any errors.
The other test resulted in:
root@ubuntu:/mnt/home/mikael# tar -f /dev/zero -c --one-file-system /mnt
...
tar:
/mnt/var/log/journal/abdb30cb66eb43ec8f9c05e1bc6e2af5/system@06f2a50856274666b1535caf10c332fa-0000000000000001-00054a024e49b482.journal:
Read error at byte 0, while reading 9728 bytes: Input/output error
tar: /mnt/nix/var/nix/daemon-socket/socket: socket ignored
tar: Removing leading `/' from hard link targets
tar:
/mnt/nix/store/5n2f3kak5vf8978h98kw3zq5p191cvyl-ghostscript-fonts/n021003l.pfb:
Read error at byte 0, while reading 7680 bytes: Input/output error
tar: Exiting with failure status due to previous errors
root@ubuntu:/mnt/home/mikael#
I suppose this does not look so good. Do you want me to send you some
more information regarding the problem or should I just remove the
newest checkpoint and see if that helps?
Eventually, if you can find a checkpoint/snapshot that is
error-free, you can delete any newer corrupted ones and mount that
one read-write. Ideally you would do a nice backup before doing
that.
If I remove any checkpoints with errors until I find an error-free one,
should I not just be able to reboot the system after that? Does not
NILFS automatically mount and continue on the last error-free checkpoint
then?
If you cannot find any that is error-free, probably that was
either a grievous IO error (most likely lack of proper barriers)
or the consequences of that recently discovered bug, if you are
very unlucky.
Usually the second newest checkpoint/snapshot is error-free when
a system crashes and the newest has got errors, that is usually
only the newest checkpoint is invalid.
--
Kind regards
Mikael Andersson