On Sat, Dec 17, 2011 at 07:15:00PM +0100, Kay Sievers wrote: > On Fri, Dec 16, 2011 at 18:46, Andy Whitcroft <apw@xxxxxxxxxxxxx> wrote: > > In exit (udevadm control --exit) processing we currently dump the entire > > incoming queue and ignore any further incoming kernel events. We then wait > > for the current workers to complete any events they were assigned, before > > finally exiting. However if any of the pending events should trigger a > > nested event that new event would not be processed preventing completion > > of the existing worker event. We will eventually timeout both leading > > to a long boot delay and in some cases to partially initialised cards. > > These cards often will not be repaired even during coldplug and are lost > > requiring manual intervention to recover. > > > > Modify exit processing such that we handle events with timeliness > > constraints on them, bringing them into line with normal processing. > > This allows events which are triggered from our existing workers events > > to run to completion and allow completion of those workers. This allows > > us to flush the queue and exit cleanly. > > Is 'udevadm settle' called before doing --exit in the initramfs? If > not, I guess that's why others have not seen that. No we wish to avoid 'settle'ing as that would force us to wait for the majority of device probes to complete, and we do not have the majority of the modules in the initramfs to service them. > In general, requesting firmware synchronously on module init sounds > pretty broken. The firmware request should be async. If the device > allow that, done as late as the first ifup of the netdev, and not at > module load time. I will agree that the right thing is to fix the kernel drivers overall. I have looked at a few and all of the ones I have looked at seem to follow this model, so I suspect this is not going to be a quick fix. > If udev is not running, modprobe will hang until it runs into the > timeout? Having module init depending on a 'userspace transaction' > sounds pretty weird. How does that work when the module is > compiled-in? If udev is not running we will not trigger any modprobes in response to the device discovery events and so cannot trigger firmware loads during that time. If the module is built-in then we will trigger the firmware load event, and either udev is there to service it or it is lost. However the firmware object is coldplug-able so when udev is finally (re)started we will replay the firmware event and expedite the firmware load as it has timeliness set. It is only the period when we udev is exiting that we can be running a modprobe but not willing to handle the event (without this patch). > The TIMEOUT= is basically just a left-over from the times of the crazy > /sbin/hotplug era, where all the hooked-in shell scripts took ages to > handle events, and the kernel's firmware loaders default was 10 > seconds, and it ran into that all the time. > > The patch looks sensible, but I haven't really wrapped my head around > if we really should make the TIMEOUT= handling even more special here. > I rather see the firmware loading model fixed for the affected > drivers, as I think it is very wrong, for many other reasons too, what > they do here. I believe that the change makes TIMEOUT= handline consistantly special, ie. it always triggers expedited handling regardless of whether we are running normally or in the process of exiting. Plus, by handling these events we can avoid deadlocking the modprobe for these bad drivers. -apw -- To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html