On Fri, Mar 18, 2022 at 4:47 PM Paolo Galtieri <pgaltieri@xxxxxxxxx> wrote: > > I'm having issues with a VM. > > The VM was originally created under VMware and has worked fine for a > while. Today when I booted it up instead of seeing the usual MATE login > screen I get a login prompt: > > f34-01-vm: > > no matter what I enter, root or pgaltieri as login it never asks for > password and immediately says login incorrect. While it's booting I see > several [FAILED]... messages, e.g. [FAILED] to start CUPS Scheduler > > I booted the system again and this time it dropped into emergency mode. > In emergency mode I see the following messages in dmesg: > > BTRFS info (device sda2): flagging fs with big metadata feature > BTRFS info (device sda2): disk space caching is enabled > BTRFS info (device sda2): has skinny extents > BTRFS info (device sda2): start tree-log replay > BTRFS info (device sda2): parent transid verify failed on 61849600 > wanted 145639 fount 145637 > BTRFS info (device sda2): parent transid verify failed on 61849600 > wanted 145639 fount 145637 > BTRFS: error (device sda2) in btrfs_replay_log:2423 errno=-5 IO failure > (Failed to recover log tree) > BTRFS error (device sda2) open_ctree failed That's not good. The tree-log is used during fsync as an optimization to avoid having to do full file system metadata updates. Since the tree-log exists, we know this file system was undergoing some fsync write operations which were then interrupted. Either the VM or host crashed, or one of them was forced to shutdown, or there's a bug that otherwise prevented the guest operations from completing. Further, the parent transid verification failure messages indicate some out of order writes, as if the virtual drive+controller+cache is occasionally ignoring flush/FUA requests. I regularly use qemu-kvm VM with cache mode "unsafe". The VM can crash all day long and at most I lose ~30s of the most recent writes, depending on the fsync policy of the application doing the writes. But the file system mounts normally otherwise following the crash. However if the host crashes while the guest is writing, that file system can be irreparably damaged. This is expected. So you might want to check the cache policy being used, make sure that the guest VM is really shutting down properly before rebooting/shutting down the host. > > I ran btrfs check in emergency mode and it came up with a lot of errors. > > How do i recover the partition(s) so I can boot the system, or at least > mount them? I'd start with mount -o ro,nologreplay,rescue=usebackuproot Followed by mount -o ro,nologreplay,rescue=all The second one is a bit of a heavy hammer but it's safe insofar as it's mounting the fs read only and making no changes. It is also disabling csum checking so any corrupt files still get copied out, and without any corruption warnings. You can check man 5 btrfs to read a bit more about the other options and vary the selection. This is pretty much a recovery operation, i.e. get the important data out. The repair comment for this particular set of errors: btrfs rescue zero-log btrfs check --repair --init-extent-tree btrfs check --repair I have somewhat low confidence that it can be repaired rather than make things worse. So you should start out with the earlier mount commands to get anything important out of the fs first. IF those don't work and there's important information to get out, you need to use btrfs restore. -- Chris Murphy _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure