On Fri, Feb 05, 2021 at 06:28:13PM -0500, Jeff Hostetler wrote: > > OK. I'm still not sure of the endgame here for writing non-racy code to > > establish the socket (which is going to require either some atomic > > renaming or some dot-locking in the caller). But it's plausible to me > > that this option will be a useful primitive. > > In part 14/14 in `ipc-unix-sockets.c:create_listener_socket()` I have > code in the calling layer to (try to) handle both the startup races > and basic collisions with existing long-running servers already using > the socket. There you make a temp socket and then try to rename it into place. But because rename() overwrites the destination, it still seems like two creating processes can race each other. Something like: 0. There's no "foo" socket (or maybe there is a stale one that nobody's listening on). 1. Process A wants to become the listener. So it creates foo.A. 2. Process B likewise. It creates foo.B. 3. Process A renames foo.A to foo. It believes it will now service clients. 4. Process B renames foo.B to foo. Now process A is stranded but doesn't realize it. I.e., I don't think this is much different than an unlink+create strategy. You've eliminated the window where a process C shows up during steps 3 and 4 and sees no socket (because somebody else is in the midst of a non-atomic unlink+create operation). But there's no atomicity between the "ping the socket" and "create the socket" steps. > But you're right, it might be good to revisit that as a primitive at > this layer. We only have 1 other caller right now and I don't know > enough about `credential-cache--daemon` to know if it would benefit > from this or not. Yeah, having seen patch 14, it looks like your only new caller always sets the new unlink option to 1. So it might not be worth making it optional if you don't need it (especially because the rename trick, assuming it's portable, is superior to unlink+create; and you'd always be fine with an unlink on the temp socket). The call in credential-cache--daemon is definitely racy. It's pretty much the same thing: it pings the socket to see if it's alive, but is still susceptible to the problem above. I was was never too concerned about it, since the whole point of the daemon is to hang around until its contents expire. If it loses the race and nobody contacts it, the worst case is it waits 30 seconds for somebody to give it data before exiting. It would benefit slightly from switching to the rename strategy, but the bigger race would remain. -Peff