Re: Helped wanted with tricky potential F41 blocker

Adam Williamson <adamwill@xxxxxxxxxxxxxxxxx> · Fri, 18 Oct 2024 15:26:41 -0700

On Fri, 2024-10-18 at 17:10 -0500, Eric Sandeen wrote:
> On 10/18/24 4:42 PM, Adam Williamson wrote:
> > Hey folks! I'm sending up a flare for help with a potential F41 blocker
> > that looks pretty tricky. It is
> > https://bugzilla.redhat.com/show_bug.cgi?id=2318710 .
> > 
> > The problem is fairly easy to reproduce. Install Fedora 40 or 41 Beta
> > with an ext4 root partition, take a snapshot (for convenience in
> > testing), then do an offline upgrade to current F41 (or offline update
> > any one of a specific list of packages that triggers the issue, which
> > Kamil Paral worked out - see
> > https://bugzilla.redhat.com/show_bug.cgi?id=2318710#c14 ). On the boot
> > after the offline upgrade runs, you'll drop to emergency mode, with the
> > system complaining about 'ext4 bad orphan inode' issues. But if you
> > just reboot from this state, the system will then boot up fine.
> > 
> > This only seems to happen on ext4, it's not affecting installs to xfs
> > or btrfs. But we suspect there are still quite a few people out there
> > with their root partition on ext4, so we're worried this might have to
> > block the release.
> > 
> > It's a pretty odd bug. We can't see anything much in common between the
> > packages that trigger it - no files in weird places, no odd scripts.
> > The failure case itself is pretty weird. Fabio had a good theory that
> > it might be caused by the rpm-plugin-ima package, but sadly testing I
> > did today indicates that is not the case.
> > 
> > If anyone has any bright ideas what might be going on here, please do
> > reply or add them to the bug! Thanks.
> 
> Hm, for starters, from the bug:
> 
> > The logs contain:
> > 
> > systemd-fsck[489]: /dev/vda3: recovering journal
> > systemd-fsck[489]: /dev/vda3: Clearing orphaned inode 295083 (uid=0, gid=0, mode=0100755, size=60800)
> ...
> 
> Why does the root filesystem require recovery at all? Why was root not
> cleanly unmounted / remounted readonly on the prior reboot? Might be worth
> looking at the reboot logs before this boot error.

They're attached to the bug. The whole shutdown process happens
suspiciously quickly - it's all in the same second the update
transaction ends - but at least on the face of it, everything is
stopped and unmounted in an orderly fashion.

https://bugzilla.redhat.com/attachment.cgi?id=2052143
> 
> Can anyone get a metadata image (e2image -Q /path/to/root/device image.qcow2) post-upgrade,
> before reboot tries to run fsck and fix things?

I can try, but at least by default, offline upgrade reboots
automatically at the end, so we'll have to interrupt that. I don't
recall off hand if there's a parameter you can use, or if I'll have to
do something hackier. (Or I suppose I can just interrupt the next boot
and boot a live image instead).

Thanks for the ideas!
-- 
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @adamw@xxxxxxxxxxxxx
https://www.happyassassin.net

-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue