Re: [GIT] Experimental threaded udev

Kay Sievers <kay.sievers@xxxxxxxx> · Mon, 1 Jun 2009 13:32:32 +0200

On Mon, Jun 1, 2009 at 11:29, Alan Jenkins <alan-jenkins@xxxxxxxxxxxxxx> wrote:
>> The thread- and worker-versions do not create as many COW page-faults in
>> the daemon after every cloned event-process, and therefore need much
>> less CPU.
>>
>> At least on the dual-core laptop here, the pool of workers seems to be
>> faster than the threads.
>>
>
> Great, you beat me to it.  It makes sense that this would be _bit_ faster
> than threads.  As I say I tried to avoid the page faults caused (somehow) by
> forking the extra thread-stack mappings, but it didn't really work out.  I'm
> surprised by quite how much faster it is though!

Yeah, the 1.2 sec system time looked a bit too scary.

> I have some thoughts which may help in bringing the code up from "rough
> hack" quality :-).

Cool. :)

> Aren't signal queues unreliable?  If you exceed the maximum number of queued
> signals, sigqueue will fail with EAGAIN, and I don't think there's a
> blocking version :-).

Right, it's a rlimit, and I think I checked and remember 40.000+
signals here. The workers could detect, and re-send a non-queued
signal, if needed.

> I think a  pipe would be better

Maybe. I was lazy and tried to avoid file descriptors, in case we get
a really large number of workers to maintain. :)

> or maybe you can do something with netlink.

Right, but then, I guess, we would need to do MSG_PEEK, or something,
to find out if the main daemon received a kernel message or a worker
message, before trying to do a receive_device().

> +                       /* wait for more device messages from udevd */
> +                       do
> +                               dev = udev_monitor_receive_device(monitor);
> +                       while (dev == NULL);
>
> I think this loop should finish when the signal handler sets worker_exit?
>  But maybe you didn't actually install a handler for SIGTERM and it's still
> being reset to the default action:

Yeah, it's not handled with worker_exit. Now, it's not reliable to
kill event processes from something else than the main daemon. The
worker_exit might be nice for valgrind tests though.

> I'm not sure you're handling worker processes crashing, it looks like you
> might leave the event in limbo if you get SIGCHLD for a worker in state
> WORKER_RUNNING.  I'm sure you'll test that though.

Maybe we can set the event to QUEUED, when a worker dies with an event attached.

> Talking of unexpected crashes.  I anticipated using a connection-oriented
> socketpair for passing events.  Using netlink for this means the worker
> threads don't get a nice notification if the daemon dies without killing
> them.

Right, thats not too nice. I guess we have pretty much lost if the
main daemon dies unexpectedly. We need to find out, if we want netlink
and signals, or socketpair()s here, I guess.

> Unless this is covered by udevd being the session leader, or
> something like that?  I'll test this empirically - maybe it's not important,
> but I think it's good to know what will happen.

If the main daemon exits normally, it sends TERM to the entire process group

> BTW I had a go at this too, but a couple of my workers fail in about 1 run
> out of 20, apparently because calloc() returns NULL without setting errno.
>  I'm sure it's my fault but I'll try to track it down.  Obviously I'll let
> you know if your patch could be affected by the same problem.

Oh, strange. Would be good to know what causes this.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html