On Thu, 7 Jul 2022 at 16:17, Ramsay Jones <ramsay@xxxxxxxxxxxxxxxxxxxx> wrote: > [ > Jeff: sorry for not CC:-ing you on the original email - I had intended > to do just that, but forgot! :( > > See: https://lore.kernel.org/git/9dc3e85f-a532-6cff-de11-1dfb2e4bc6b6@xxxxxxxxxxxxxxxxxxxx/ > ] > > On 07/07/2022 07:15, Junio C Hamano wrote: > > Ramsay Jones <ramsay@xxxxxxxxxxxxxxxxxxxx> writes: > > > >> However, I had some time to kill tonight, so I decided to take a _quick_ look > >> to see if there was something that could be done ... (famous last words). > >> ... > >> diff --git a/builtin/credential-cache.c b/builtin/credential-cache.c > >> index 78c02ad531..84fd513c62 100644 > >> --- a/builtin/credential-cache.c > >> +++ b/builtin/credential-cache.c > >> @@ -27,7 +27,7 @@ static int connection_fatally_broken(int error) > >> > >> static int connection_closed(int error) > >> { > >> - return (error == ECONNRESET); > >> + return (error == ECONNRESET) || (error == ECONNABORTED); > >> } > > > > This feels like papering over the problem. > > Agreed, ... which is what I really meant by "(Well, it side-steps the > problem, really)." > > >> Having noticed that the 'timeout' test was not failing, I decided to try > >> making the 'action=exit' code-path behave more like the timeout code, as > >> far as exiting the server is concerned. Indeed, you might ask why the > >> timeout code doesn't just 'exit(0)' as well ... > >> > >> Anyway, the following patch does that, and it also provides a 'fix' for this > >> issue! > > > > If this codepath was written like this (i.e. [PATCH 1C]) from the > > beginning, I would have found it very sensible (i.e. instead of > > caling exit() in the middle of the infinite client serving loop, > > exiting the loop cleanly is easier to follow and maintain), even if > > we didn't know the issue on Cygwin you investigated. > > Yep, apart from the variable name, I quite like the approach taken by > the 1C patch. > > All three of these patches were really just "showing my working" and > allowing anyone to "follow along" without the hassle of trying to > scrape the diffs from the email. > > As I said, I don't think we can determine a suitable fix without first > finding the cygwin commit which caused this test failure. But if we > can't determine this, for whatever reason, then I would favour a patch > to git based on the 1C patch. (Writing the commit message to justify the > change, without mentioning this cygwin issue, may be more challenging! :) I've been trying to dig into this; I've essentially never played with the code for Cygwin itself until now, but I suspect I'm probably one of the best-placed folks to actually do that investigation. Unfortunately, I've gone as far back as 18 December 2020 in the code for the Cygwin DLL itself, and I'm still seeing t0301 failing in exactly the same way. There's a few possible explanations for that, but my guess is either (a) the issue isn't in the Cygwin DLL itself but in some other library that was updated around the same time, or (b) I'm not managing as clean a build as I'm aiming for, and my builds of the old Cygwin commits are being polluted by something in my current environment. Either way I think I can make progress: my next step is to (temporarily) give up on bisecting by commit in the repository that tracks the Cygwin DLL, and instead bisect by time using the Cygwin Time Machine, which should let me get an entire Cygwin environment as it would have been at some point in the past. > Also, I would like to understand why the code is written as it is > currently. I'm sure there must be a good reason - I just don't know > what it is! I suspect (ie I'm guessing), it has something to do with > operating in a high contention context [TOCTOU on socket?] ... dunno. ;-) > > ATB, > Ramsay Jones