On Mon, Jun 1, 2009 at 11:29, Alan Jenkins <alan-jenkins@xxxxxxxxxxxxxx> wrote: >> The thread- and worker-versions do not create as many COW page-faults in >> the daemon after every cloned event-process, and therefore need much >> less CPU. >> >> At least on the dual-core laptop here, the pool of workers seems to be >> faster than the threads. >> > > Great, you beat me to it. It makes sense that this would be _bit_ faster > than threads. As I say I tried to avoid the page faults caused (somehow) by > forking the extra thread-stack mappings, but it didn't really work out. I'm > surprised by quite how much faster it is though! Yeah, the 1.2 sec system time looked a bit too scary. > I have some thoughts which may help in bringing the code up from "rough > hack" quality :-). Cool. :) > Aren't signal queues unreliable? If you exceed the maximum number of queued > signals, sigqueue will fail with EAGAIN, and I don't think there's a > blocking version :-). Right, it's a rlimit, and I think I checked and remember 40.000+ signals here. The workers could detect, and re-send a non-queued signal, if needed. > I think a pipe would be better Maybe. I was lazy and tried to avoid file descriptors, in case we get a really large number of workers to maintain. :) > or maybe you can do something with netlink. Right, but then, I guess, we would need to do MSG_PEEK, or something, to find out if the main daemon received a kernel message or a worker message, before trying to do a receive_device(). > + /* wait for more device messages from udevd */ > + do > + dev = udev_monitor_receive_device(monitor); > + while (dev == NULL); > > I think this loop should finish when the signal handler sets worker_exit? > But maybe you didn't actually install a handler for SIGTERM and it's still > being reset to the default action: Yeah, it's not handled with worker_exit. Now, it's not reliable to kill event processes from something else than the main daemon. The worker_exit might be nice for valgrind tests though. > I'm not sure you're handling worker processes crashing, it looks like you > might leave the event in limbo if you get SIGCHLD for a worker in state > WORKER_RUNNING. I'm sure you'll test that though. Maybe we can set the event to QUEUED, when a worker dies with an event attached. > Talking of unexpected crashes. I anticipated using a connection-oriented > socketpair for passing events. Using netlink for this means the worker > threads don't get a nice notification if the daemon dies without killing > them. Right, thats not too nice. I guess we have pretty much lost if the main daemon dies unexpectedly. We need to find out, if we want netlink and signals, or socketpair()s here, I guess. > Unless this is covered by udevd being the session leader, or > something like that? I'll test this empirically - maybe it's not important, > but I think it's good to know what will happen. If the main daemon exits normally, it sends TERM to the entire process group > BTW I had a go at this too, but a couple of my workers fail in about 1 run > out of 20, apparently because calloc() returns NULL without setting errno. > I'm sure it's my fault but I'll try to track it down. Obviously I'll let > you know if your patch could be affected by the same problem. Oh, strange. Would be good to know what causes this. Thanks, Kay -- To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html