On Wed, Jul 13, 2022 at 03:42:38PM +0100, Adam Dinwoodie wrote: > Having now spent far more time than I'd like wrangling the Cygwin build > infrastructure, I've found the change in the Cygwin code that introduced > the break. But I'm afraid progressing beyond this -- and in particular > making any sort of judgement about appropriate next steps -- is beyond > what I'm going to have time for in the foreseeable future. Hmm. So I had assumed that the problem was unlink()ing the socket path while the client was still trying to read(). If that's the case, then I _think_ the minimal reproduction below should also trigger the problem. That might give you something more useful to show to Cygwin folks. But... > commit ef95c03522f65d5956a8dc82d869c6bc378ef3f9 (HEAD, refs/bisect/bad) > Author: Corinna Vinschen <corinna@xxxxxxxxxxx> > Date: Tue Apr 6 21:35:43 2021 +0200 > > Cygwin: select: Fix FD_CLOSE handling > > An FD_CLOSE event sets a socket descriptor ready for writing. > This is incorrect if the FD_CLOSE is a result of shutdown(SHUT_RD). > Only set the socket descriptor ready for writing if the FD_CLOSE > is indicating an connection abort or reset error condition. > > This requires to tweak fhandler_socket_wsock::evaluate_events. > FD_CLOSE in conjunction with FD_ACCEPT/FD_CONNECT special cases > a shutdown condition by setting an error code. This is correct > for accept/connect, but not for select. In this case, make sure > to return with an error code only if FD_CLOSE indicates a > connection error. ...this is about select outcomes. Which makes me think two things (though these are really pretty blind guesses, as I'm not at all familiar with Cygwin's socket code): - it may be that this situation was always ECONNABORTED (it's not on Linux, but perhaps due to details of the underlying socket code on Windows, it's different there). And this patch is just surfacing that state to the caller better. - it could be unrelated to the unlink() entirely. The socket code in Git dups the client descriptor and opens two FILE* handles, one for reading and one for writing. Could it be that it's important to fclose() one before the other (and the implicit closing done by exit() does the wrong order)? It seems like a stretch, but this commit message is talking about a shutdown(SHUT_RD), which is kind of what you get by fclose-ing one side (though again, not on Linux, because the "FILE*" are aware that they are read/write, but the underlying descriptors aren't). Or maybe those are just totally off track. I know you said you don't have time to dig further, and that's fine. But if you or anybody has a chance to try the program below, it might be interesting to see the result (it would confirm whether it's the unlink() that's the problem). -- >8 -- #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/socket.h> #include <sys/un.h> #define SOCKET_PATH "mysocket" static void diesys(const char *syscall) { perror(syscall); exit(1); } static int create_server(void) { int fd; struct sockaddr_un sa = { .sun_family = AF_UNIX, .sun_path = SOCKET_PATH, }; fd = socket(AF_UNIX, SOCK_STREAM, 0); if (fd < 0) diesys("socket"); if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) diesys("bind"); if (listen(fd, 5) < 0) diesys("listen"); return fd; } static void do_server(int listen_fd) { int client_fd = accept(listen_fd, NULL, NULL); if (client_fd < 0) diesys("accept"); if (unlink(SOCKET_PATH) < 0) diesys("unlink"); if (close(client_fd) < 0) diesys("close(client)"); close(listen_fd); } static void do_client(void) { int fd; char buf[64]; struct sockaddr_un sa = { .sun_family = AF_UNIX, .sun_path = SOCKET_PATH, }; fd = socket(AF_UNIX, SOCK_STREAM, 0); if (fd < 0) diesys("socket"); if (connect(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) diesys("connect"); if (read(fd, buf, sizeof(buf)) < 0) diesys("read"); close(fd); } int main(void) { int fd = create_server(); if (fork()) { do_server(fd); } else { close(fd); do_client(); } return 0; }