Hello, On Mon, Feb 06, 2023 at 10:18:11PM +0000, Luck, Tony wrote: > Imagine some AI training application with one process running per core on > a server with a hundred or so cores. Each of these processes wants periodically > to share work so far on a subset of the problem with one or more other processes. > The "virtual windows" allow an accelerator device to copy data between a region > in the source process (the owner of the virtual window) and another process that > needs to access/supply updates. > > Process tree is easy if the test is just "do these two tasks have the same getppid()?" > Seems harder if the process tree is more complex and I want "Are these two processes > both descended from a particular common ancestor?" > > Using fd passing would involve an O(N^2) step where each process talks to each > other process in turn to complete a link in the mesh of connections. This would need > to be repeated if additional processes are started. Wouldn't it be more usual for the parent to create the fd and let all the children share through it? Even if not necessarily the parent, there can always be a main process that can send the fd to whoever needs it. > It would be much nicer to have an operation that matches what the applications > want to do, namely "I want to broadcast-share this with all my peers". > > [N.B. I've suggested that these folks should just re-write their applications to > simply attach to a giant blob of shared memory, and thus avoid all of this. But > that doesn't fit for various reasons] I'm not sure it'd be a good idea to introduce a whole new mode of access control for this when it's something which can be addressed with more conventional mechanisms. Maybe it's a bit more upfront work but one-off security / naming mechanism feels like they'd have a reasonable chance to cause long term headaches. Thanks. -- tejun