On 10/18/24 4:42 PM, Adam Williamson wrote: > Hey folks! I'm sending up a flare for help with a potential F41 blocker > that looks pretty tricky. It is > https://bugzilla.redhat.com/show_bug.cgi?id=2318710 . > > The problem is fairly easy to reproduce. Install Fedora 40 or 41 Beta > with an ext4 root partition, take a snapshot (for convenience in > testing), then do an offline upgrade to current F41 (or offline update > any one of a specific list of packages that triggers the issue, which > Kamil Paral worked out - see > https://bugzilla.redhat.com/show_bug.cgi?id=2318710#c14 ). On the boot > after the offline upgrade runs, you'll drop to emergency mode, with the > system complaining about 'ext4 bad orphan inode' issues. But if you > just reboot from this state, the system will then boot up fine. > > This only seems to happen on ext4, it's not affecting installs to xfs > or btrfs. But we suspect there are still quite a few people out there > with their root partition on ext4, so we're worried this might have to > block the release. > > It's a pretty odd bug. We can't see anything much in common between the > packages that trigger it - no files in weird places, no odd scripts. > The failure case itself is pretty weird. Fabio had a good theory that > it might be caused by the rpm-plugin-ima package, but sadly testing I > did today indicates that is not the case. > > If anyone has any bright ideas what might be going on here, please do > reply or add them to the bug! Thanks. Hm, for starters, from the bug: > The logs contain: > > systemd-fsck[489]: /dev/vda3: recovering journal > systemd-fsck[489]: /dev/vda3: Clearing orphaned inode 295083 (uid=0, gid=0, mode=0100755, size=60800) ... Why does the root filesystem require recovery at all? Why was root not cleanly unmounted / remounted readonly on the prior reboot? Might be worth looking at the reboot logs before this boot error. But then ... > kernel: EXT4-fs (vda3): orphan cleanup on readonly fs > kernel: EXT4-fs error (device vda3): ext4_orphan_get:1421: comm mount: bad orphan inode 295083 > kernel: ext4_test_bit(bit=170, block=1048596) = 0 Ok, well, fsck *just said* it had cleared that inode. :/ > Could the issue lie with pk-offline-update? Seems like it is rebooting too quickly > after the packages are updated; before the filesystem is stable. Not sure what that means, but hints at my "why is the journal being replayed? why was the root fs not quiesced on the reboot?" question above. > Of course the journal shouldn't need to be recovered in the first place correct ... Can anyone get a metadata image (e2image -Q /path/to/root/device image.qcow2) post-upgrade, before reboot tries to run fsck and fix things? I can try to get to reproducing this but if it's easy for anyone else, please make that e2image, compress it, and stick it on the bug if it fits. Upgrade failing to properly reboot the system (leaving the root fs in a state that needs recovery) may be the core problem here, but that said, fsck and/or log recovery should still yield a consistent filesystem even in that case, and apparently it is not. Thanks, -Eric -- _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue