credential-cache--daemon: clarify "exit" action semantics When this code was originally written, there wasn't much thought given to the timing between a client asking for "exit", the daemon signaling that the action is done (with EOF), and the actual cleanup of the socket. However, we need to care about this so that our test scripts do not end up racy (e.g., by asking for an exit and checking that the socket was cleaned up). The code that is already there happens to behave very reasonably; let's add a comment to make it clear that any changes should retain the same behavior. So with the proposed change, t0301 will now fail racily. We need that unlink() to happen before the fclose(). Just calling exit() does things in the right order, but it should also be OK to do an explicit: delete_tempfile(&socket_file); fclose(in); fclose(out); That would probably require making socket_file a global variable. (You can't just return out of the serving loop, since that closes the sockets first before deleting the tempfile). > Clients can signal the git-credential-cache(1) daemon that it is > supposed to exit by sending it an "exit" command. The details around > how exactly the daemon exits seem to be rather intricate as spelt out by > a comment surrounding our call to exit(3p), as we need to be mindful > around closing the client streams before we signal the client. > > The logic is broken on Cygwin though: when a client asks the daemon to > exit, they won't see the EOF and will instead get an error message: > > fatal: read error from cache daemon: Software caused connection abort > > This issue is known in Cygwin, see for example [1], but the exact root > cause is not known. > [...] > [1]: https://github.com/cygporter/git/issues/51 I don't see any details at that issue, but I'm not sure how it would fix things. From the client's perspective, they are going to see the descriptor either way. Unless there is some magic that fclose() does that a normal descriptor-close-on-exit does not do. That "Software caused connection abort" thing seems like a weird not-standard-Unix errno value. Searching for it mostly yields people complaining about getting it from ssh under cygwin. :) If the magic that cygwin needs is actually "fclose before unlink", then that is in opposition to other platforms (and I suspect would make t0301 racy there). > As it turns out, we can avoid the issue by explicitly closing the client > streams via fclose(3p). I'm not sure at all where the claimed atexit(3p) > handler mentioned in the comment is supposed to live, but from all I can > see we do not have any installed that would close the sockets for us. So > this leaves me with a bit of a sour taste overall. The mention of atexit is a little oblique these days. We moved from a manual atexit handler to using the tempfile.c handler in 9e9033166b (credential-cache--daemon: use tempfile module, 2015-08-10). > That being said, I couldn't spot anything obviously wrong with closing > the streams ourselves, and it does fix the issue on Cygwin without any > regressions on other platforms as far as I can see. So let's go for this > fix, even though I cannot properly explain it. Running t0301 with --stress on Linux failed for me after a minute or so. -Peff