Re: Long wait for start job

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Sun, 13 Jun 2021 14:21:57 -0600

On Sun, Jun 13, 2021 at 3:56 AM Patrick O'Callaghan
<pocallaghan@xxxxxxxxx> wrote:
>
> On Sun, 2021-06-13 at 07:09 +0800, Ed Greshko wrote:
> > On 13/06/2021 06:57, Ed Greshko wrote:
> > > But, does your plot show a difference?
> >
> > Speaking of your plot.....
> >
> > Don't you think the time between
> >
> > sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-
> > 1\x2d1.6.2.device and
> > dev-disk-by\x2dpath-
> > pci\x2d0000:00:14.0\x2dusb\x2d0:3:1.0\x2dscsi\x2d0:0:0:1.device
> >
> > worth looking into?
>
> Of course. That's precisely the issue I'm concerned about. I don't see
> what's causing it. My working hypothesis is that it's somehow related
> to the fact that the external dock supports two drives in a BTRFS RAID1
> configuration and that the kernel is verifying them when it starts up,
> even though the drives are not being mounted (they have an automount
> unit but nothing in /etc/fstab).
>
> Why it would delay the rest of the system startup while this is
> happening is something I don't understand. The delay is very visible (I
> get three dots on a blank screen while it's happening).

Short version:
Is this Btrfs raid1 listed at all in fstab? If so, add noauto,nofail
to the mount options, see if that clears it up.

Long version:
Dracut handles mdadm array assembly. Normal assembly (non-degraded) is
done by dracut using the mdadm command; but if that fails, dracut
starts a count down loop, I think 300 seconds, before it tries a
degraded assembly. None of this exists for btrfs raid at all in
dracut. For one, btrfs raid assembly is combined with mount. The mount
command pointed to any of the member devices results in the kernel
finding all the member devices automagically. If 1+ member is missing,
mount fails. Since systemd only tries to mount one time, and because
it's decently likely mounting a multiple device btrfs as /sysroot will
fail as a result of one or more devices not yet being ready, there is
a udev rule to wait for everyone to get ready:

/usr/lib/udev/rules.d/64-btrfs.rules

The gotcha is this simple rule waits indefinitely. This udev rule is
there to make sure normal (non-degraded) boot doesn't incorrectly fail
just because of a 1s delay with one of the devices showing up. But if
a drive has actually failed, it results in a hang. Forever. You can
add "x.systemd.timeout=300" boot parameter to approximate the rather
long dracut wait for mdadm. And at a dracut shell, you can then just:

mount -o degraded /dev/sdXY /sysroot
exit

And away you go. Of course this is non-obvious. And it needs to work
better. And it will, eventually.

So the next gotcha is if /sysroot is not Btrfs. In this case there's a
bug in dracut that prevents this udev rule from being put into the
initramfs. That means anything that does try to mount a non-root Btrfs
during boot, either fstab or gpt discoverable partitions, might
possibly fail if "not all devices are ready" at the time of the mount
attempt.

https://github.com/dracutdevs/dracut/issues/947

This should be fixed in dracut 055, but if you already have 055 and
have an initramfs built with it and this problem you're having is a
new problem, maybe we've got a regression in 055 or something? I'm not
sure yet...still kinda in the dark on what's going wrong.

Also, it is possible it's not related to this btrfs file system at
all, but I'm throwing it out there just as something to be aware of.

-- 
Chris Murphy
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure