Re: udev: timeout on WAIT_FOR_SYSFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 2, 2011 at 16:57, Filipe Brandenburger <filbranden@xxxxxxxxx> wrote:
> I'm having some problems with a Red Hat 5 based machine that still
> uses the udev code before that patch. I'm having the problem that some
> times the devices do not all show up properly by the time they are
> needed.
>
> The devices are on a SAN, they are behind a QLA2xxx HBA and on an
> Hitachi storage.
>
> Occasionally (not always), I'm having a message such as this one during reboot:
> udevd-event[24448]: wait_for_sysfs: waiting for
> '/sys/devices/pci0000:00/0000:00:06.0/0000:0d:00.1/host1/rport-1:0-0/target1:0:0/1:0:0:4/ioerr_cnt'
> failed

That looks like a correct device to expect the SCSI device attribute.
There is no rule to fix.

> The target and LUN change sometimes but it's basically the same
> message. Soon after that, I have initialization of Veritas VxVM but
> when I get this message it fails as not all devices are available.
> Other times the boot process completes as usual and all devices are
> present. In a previous e-mail to Kay Sievers he suggested I might be
> missing a KERNEL=="[0-9]*:[0-9]*" filter for the SCSI udev rule, but I
> don't think that's the case as the issue is intermittent and happens
> only once every "x" reboots.

That's only needed for newer kernels on systems with ancient udev,
where the old rules match new devices which did not exist at the time
udev was released.

We don't have any of the sysfs timing problems for common devices on
today's systems. We fixed them all.

Anyway, fixing the rule would only make the logged error go away, not
change anything else, or make a device appear which is missing.

> I tried adding a call to "udevsettle" just before initializing VxVM
> but it doesn't help, whenever I have the problem that triggers that
> log message the devices do not show up even after udevsettle. I also
> tried "udevtrigger" but that didn't help either.

Missing devices are likely not related to udev but your
driver/hardware/setup. I guess in the case you miss the stuff, you
also don't have the device in /sys/block, right?

> I saw this patch that looked interesting, it's from almost 4 years ago
> but it resembles the codebase of RHEL5 that I'm running:
> http://git.kernel.org/?p=linux/hotplug/udev.git;a=commit;h=39ea7c6c67de69379b603196a0eff6f7ce2e469a
>
> I'm pretty much considering applying that patch to udevd since it will
> probably fix it, but as I can't reproduce the problem reliably I
> wanted to ask some questions just to have more confidence in going on
> with that fix.

As said, that only makes the error logging go away, and maybe some
udev-event users that expect proper sysfs timing.

> For instance, the log message is somewhat vague saying some SCSI disks
> take 6.5s to populate sysfs, does someone have some details of which
> kinds of disks cause that?

I don't really remember the details, but it was probably the disk
spin-up time, that blocked the creation of the sysfs files for
seconds. The code that does that got all changed in later kernels to
be timed properly.

> Is this related to SAN disks? If this was
> experienced with qla2xxx driver and/or Hitachi SAN even better as that
> confirms the issue I'm having...
>
> Also, can someone tell what would cause a device to take long to
> populate sysfs? Is this related to the load (as in "Load Average") of
> the machine at the time the module is loaded? Could that be related to
> some heavy scripts being called from udev rules?
>
> And could someone please give me some idea of between which events the
> timeout is?

No between, it all the same device. When the kernel sends 'add' sysfs
is expected to be fully populated, which it isn't in old SCSI code.
Udev works around that by looping in the event handler until sysfs is
ready.

> Is it from the point udev gets the event from a queue
> until the device shows up in sysfs? Does that depend on the driver (in
> this case qla2xxx) or the device itself? What can affect that timing?

It's the SCSI core, that got changed. But again, it's mostly
cosmetics. It will not prevent any disk device node from being
created. Some additional symlinks might fail to get the properties,
and might not be created, but I don't remember anything like that.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux