[jch: a patch that hopefully is applicable is attached at the end; I'd appreciate input from Paolo, the contributor of the original upstream] "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes: > After a whole lot of investigating, we (it is a large "we") have discovered > the reason for the hang we occasionally get in git-upload-pack on HPE > NonStop servers - reported here well over a year ago. This resulted from a > subtle check that the operating system does on file descriptors. When it > sees random values in pfd[i].revents,... What do you mean by "random values"? Where do these random values come from? The original code the patch touches is _not_ setting anything, so it is leaving it uninitialized and that is seen by the system? If that is the case perhaps should we clear it before we call into this codepath? > .... There is a non-destructive fix > that I would like to propose for this that I have already tested. I am not sure in what sense this is "non-destructive"; we left the value as it was when we didn't notice anything happening in compute_revents(). Now we explicitly destroy the value there was (just like we do in the case where the corresponding file descriptor was negative). Maybe overwriting with 0 is the right thing, but I wouldn't call it "non-destructive", though. Puzzling... > --- a/compat/poll/poll.c > +++ b/compat/poll/poll.c > @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout) > pfd[i].revents = happened; > rc++; > } > + else > + { > + pfd[i].revents = 0; > + } > } > > return rc; FYI, the upstream gnulib rewrites this part of the code with d42461c3 ("poll: fixes for large fds", 2015-02-20) quite a bit, and it reads like this: + { + pfd[i].revents = (pfd[i].fd < 0 + ? 0 + : compute_revents (pfd[i].fd, pfd[i].events, + &rfds, &wfds, &efds)); + rc += pfd[i].revents != 0; + } The code before your fix (and before d42461c3) used to assign to pfd[i].revents only when compute_revents() gave us non-zero value, but with d42461c3 in the upstream, it pfd[i].revents is assigned the return value from compute_revents() even if the value returned is 0. And rc is incremented only when that value is non-zero. The result of applying your patch is equivalent, so after all, I tend to think that your patch is the right fix in the context of the code we have (i.e. the older code we borrowed from them). I wonder if we want to refresh the borrowed copy of poll.c to a newer snapshot someday, but that is totally a separate topic. At least, I now know that your fix in this patch will not be broken due to d42461c3 updating the code in a slightly different way ;-) Randall, I forged your Sign-off in the patch below, but please say it is OK, after reading Documentation/SubmittingPatches. Thanks. -- >8 -- From: Randall S. Becker <rsbecker@xxxxxxxxxxxxx> Subject: poll.c: always set revents, even if to zero. Match what happens to pfd[i].revents when compute_revents() returns 0 to the upstream gnulib's commit d42461c3 ("poll: fixes for large fds", 2015-02-20). The revents field is set to 0, without incrementing the value rc to be returned from the function. This fixes occasional hangs in git-upload-pack on NPE NonStop. Signed-off-by: Randall S. Becker <rsbecker@xxxxxxxxxxxxx> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> --- compat/poll/poll.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/compat/poll/poll.c b/compat/poll/poll.c index b10adc780f..ae03b74a6f 100644 --- a/compat/poll/poll.c +++ b/compat/poll/poll.c @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout) pfd[i].revents = happened; rc++; } + else + { + pfd[i].revents = 0; + } } return rc;