Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure

Nix <nix@xxxxxxxxxxxxx> · Wed, 20 Jul 2022 17:18:01 +0100

On 18 Jul 2022, Roger Heflin told this:

Oh I was hoping you'd weigh in :)

> Did it drop you into the dracut shell (since you do not have
> scroll-back this seem the case?  or the did the machine fully boot up
> and simply not find the arrays?

Well, I'm not using dracut, which is a horrifically complex
failure-prone nightmare as you note: I wrote my own (very simple) early
init script, which failed as it is designed to do when mdadm doesn't
assemble any arrays and dropped me into an emergency ash shell
(statically linked against musl). As a result I can be absolutely
certain that nothing has changed or been rebuilt in early init since I
built my last working kernel. (It's all assembled into an initramfs by
the in-kernel automated assembly stuff under the usr/ subdirectory in
the kernel source tree.)

Having the initramfs linked into the kernel is *such a good thing* in
situations like this: I can be absolutely certain that as long as the
data on disk is not fubared nothing can possibly have messed up the
initramfs or early boot in general after the kernel is linked, because
nothing can change it at all. :)

(I just diffed both initramfses from the working and non-working
kernels: the one in the running kernel I keep mounted under
/run/initramfs after boot is over because it also gets used during late
shutdown, and the one from the new broken one is still in the cpio
archive in the build tree, so this was easy. They're absolutely
identical.)

> If it dropped you to the dracut shell, add nofail on the fstab
> filesystem entries for the raids so it will let you boot up and debug.

Well... the rootfs is *on* the raid, so that's not going to work.
(Actually, it's under a raid -> bcache -> lvm stack, with one of two
raid arrays bcached and the lvm stretching across both of them. If only
the raid had come up, I have a rescue fs on the non-bcached half of the
lvm so I could have assembled it in degraded mode and booted from that.
But it didn't, so I couldn't.)

Assembly does this:

    /sbin/mdadm --assemble --scan --auto=md --freeze-reshape

The initramfs includes this mdadm.conf:

DEVICE partitions
ARRAY /dev/md/transient UUID=28f4c81c:f44742ea:89d4df21:6aea852b
ARRAY /dev/md/slow UUID=a35c9c54:bcdbff37:4f18163e:a93e9aa2
ARRAY /dev/md/fast UUID=4eb6bf4e:7458f1f1:d05bdfe4:6d38ca23

MAILADDR postmaster@xxxxxxxxxxxxx

(which is again identical on the working kernels.)

> If it is inside the dracut shell it sounds like something in the
> initramfs might be missing.

Definitely not, thank goodness.

>                              I have seen a dracut (re)build "fail" to
> determining that a device driver is required and not include it in the
> initramfs, and/or have seen the driver name change and the new kernel

Oh yes: this is one of many horrible things about dracut. It assumes the
new kernel is similar enough to the running one that it can use the one
to configure the other. This is extremely not true when (as this machine
is) it's building kernels for *other* machines in containers :) but the
thing that failed was the machine's native kernel.

(It is almost non-modular. I only have fat and vfat modules loaded right
now: everything else is built in.)

> I have also seen newer versions of software stacks/kernels
> create/ignore underlying partitions that worked on older versions (ie
> a partition on a device that has the data also--sometimes a
> partitioned partition).

I think I can be sure that only the kernel itself has changed here.
I wonder if make oldconfig messed up and I lost libata or something? ...
no, it's there.

> Newer version have suddenly saw that /dev/sda1 was partitioned and
> created a /dev/sda1p1 and "hidden" sda1 from scanning causing LVM not
> not find pvs.
>
> I have also seen where the in-use/data device was /dev/sda1p1 and an
> update broke partitioning a partition so only showed /dev/sda1, and so
> no longer sees the devices.

Ugh. That would break things indeed! This is all atop GPT...

CONFIG_EFI_PARTITION=y

no it's still there.

I doubt anyone can conclude anything until I collect more info. This bug
report is mostly useless for a reason!