Hi there, A race condition in udev has turned up in Ubuntu's initramfs, causing 60-second delays in the boot sequence because udevd loses track of one of its workers: it's waiting for a pending event to be processed, but the worker it was dispatched to has already exited because the event and signal were received by the worker at the same time and the signal took precedence. There are a few ways we could deal with this. - Let udevd clear its list harder when a worker exits, since if the process hasn't already sent back its information it's not going to do so from beyond the grave and we might as well clean up the references to it. This seems to be safe because fd_worker is processed before fd_signal in the main event loop, so any last words will be duly recorded. - Reorder the event loop so that signals and the control socket are processed before the event queue. This way, udev would never dispatch a new event to a worker after it's been told to exit, which is what happens now. However, there may be reasons for the current order that I don't see offhand; and anyway, I don't think the reordering is a guarantee that the event and signal wouldn't arrive at the same time, it just makes it orders of magnitude less likely. - When a worker sees that an event has arrived, process it immediately. This ensures everything udevd has delegated is finished before the worker handles the signal (i.e., exits). This means the worker will take longer to finish up in some cases, but probably not 60 seconds longer... The below patch implements option three. Maybe one of the other options should also be implemented, but in my testing #3 seems to be sufficient in its own right to solve the problem. An implicit assumption in this patch is that, when an event and signal arrive together, the event is returned *first* from epoll_wait(). This has been the case in my tests, but I haven't spotted any reason that it's guaranteed to always be the case. Cheers, -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developer http://www.debian.org/ slangasek@xxxxxxxxxx vorlon@xxxxxxxxxx From 84ead317ea041efa066b5134d5df0fd6fc075aa9 Mon Sep 17 00:00:00 2001 From: Steve Langasek <steve.langasek@xxxxxxxxxxxxx> Date: Sat, 8 Oct 2011 01:34:32 -0700 Subject: [PATCH] Process events before signals in worker When a worker receives both a signal and a udev event in the same epoll_wait run, the event must be processed first because the udev parent considers the event already dispatched. If we process the signal first and exit, udevd times out after 60 seconds waiting for a response from an already-dead worker. Ref: https://bugs.launchpad.net/bugs/818177 Signed-off-by: Steve Langasek <steve.langasek@xxxxxxxxxxxxx> --- udev/udevd.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/udev/udevd.c b/udev/udevd.c index 77aec9d..b65b53f 100644 --- a/udev/udevd.c +++ b/udev/udevd.c @@ -347,6 +347,7 @@ static void worker_new(struct event *event) for (i = 0; i < fdcount; i++) { if (ev[i].data.fd == fd_monitor && ev[i].events & EPOLLIN) { dev = udev_monitor_receive_device(worker_monitor); + break; } else if (ev[i].data.fd == fd_signal && ev[i].events & EPOLLIN) { struct signalfd_siginfo fdsi; ssize_t size; -- 1.7.5.4
Attachment:
signature.asc
Description: Digital signature