Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Did it drop you into the dracut shell (since you do not have
scroll-back this seem the case?  or the did the machine fully boot up
and simply not find the arrays?

If it dropped you to the dracut shell, add nofail on the fstab
filesystem entries for the raids so it will let you boot up and debug.
Also make sure you don't have rd_lvm_(vg|lv)= set to the devices used
for the raid on the kernel command like (this will also drop you to
dracut).

I have done that on my main server, the goal is to avoid the
no-network/no-logging dracut shell if at all possible so it can be
debugged on the network.

If it is inside the dracut shell it sounds like something in the
initramfs might be missing.  I have seen a dracut (re)build "fail" to
determining that a device driver is required and not include it in the
initramfs, and/or have seen the driver name change and the new kernel
and dracut not find the new name.  If it is this then building a
hostonly=no (include all drivers) would likely make it work for the
immediate future.

I have also seen newer versions of software stacks/kernels
create/ignore underlying partitions that worked on older versions (ie
a partition on a device that has the data also--sometimes a
partitioned partition).

Newer version have suddenly saw that /dev/sda1 was partitioned and
created a /dev/sda1p1 and "hidden" sda1 from scanning causing LVM not
not find pvs.

I have also seen where the in-use/data device was /dev/sda1p1 and an
update broke partitioning a partition so only showed /dev/sda1, and so
no longer sees the devices.





On Mon, Jul 18, 2022 at 8:16 AM Nix <nix@xxxxxxxxxxxxx> wrote:
>
> So I have a pair of RAID-6 mdraid arrays on this machine (one of which
> has a bcache layered on top of it, with an LVM VG stretched across
> both). Kernel 5.16 + mdadm 4.0 (I know, it's old) works fine, but I just
> rebooted into 5.18.12 and it failed to assemble. mdadm didn't display
> anything useful: an mdadm --assemble --scan --auto=md --freeze-reshape
> simply didn't find anything to assemble, and after that nothing else was
> going to work. But rebooting into 5.16 worked fine, so everything was
> (thank goodness) actually still there.
>
> Alas I can't say what the state of the blockdevs was (other than that
> they all seemed to be in /dev, and I'm using DEVICE partitions so they
> should all have been spotted) or anything else about the boot because
> console scrollback is still a nonexistent thing (as far as I can tell),
> it scrolls past too fast for me to video it, and I can't use netconsole
> because this is the NFS and loghost server for the local network so all
> the other machines are more or less frozen waiting for NFS to come back.
>
> Any suggestions for getting more useful info out of this thing? I
> suppose I could get a spare laptop and set it up to run as a netconsole
> server for this one boot... but even that won't tell me what's going on
> if the error (if any) is reported by some userspace process rather than
> in the kernel message log.
>
> I'll do some mdadm --examine's and look at /proc/partitions next time I
> try booting (which won't be before this weekend), but I'd be fairly
> surprised if mdadm itself was at fault, even though it's the failing
> component and it's old, unless the kernel upgrade has tripped some bug
> in 4.0 -- or perhaps 4.0 built against a fairly old musl: I haven't even
> recompiled it since 2019. So this looks like something in the blockdev
> layer, which at this stage in booting is purely libata-based. (There is
> an SSD on the machine, but it's used as a bcache cache device and for
> XFS journals, both of which are at layers below mdadm so can't possibly
> be involved in this.)
>
> --
> NULL && (void)



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux