RE: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On January 23, 2018 1:13 PM, Junio C Hamano wrote:
> 
> "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes:
> 
> >> IOW, I do not see it explained clearly why this change is needed on
> >> any single platform---so "that issue may be shared by others, too"
> >> is a bit premature thing for me to listen to and understand, as "that
> >> issue" is quite unclear to me.
> >
> > v4 might be a little better. The issue seems to be specific to NonStop
> > that it's PIPE mechanism needs to have setbuf(pipe,NULL) called for
> > git to be happy.  The default behaviour appears to be different on
> > NonStop from other platforms from our testing. We get hung up waiting
> > on pipes unless this is done.
> 
> I am afraid that that is not a "diagnosis" enough to allow us moving forward.
> We get hung up because...?  When the process that has the other end of
> pipe open exits, NonStop does not close the pipe properly?  Or NonStop
> does not flush the data buffered in the pipe?
> Would it help if a compat wrapper that does setbuf(fd, NULL) immediately
> before closing the fd, or some other more targetted mechanism, is used only
> on NonStop, for example?  Potentially megabytes of data can pass thru a
> pipe, and if the platform bug affects only at the tail end of the transfer,
> marking the pipe not to buffer at all at the beginning is too big a hammer to
> work it around.  With the explanation given so far, this still smells more like
> "we have futzed around without understanding why, and this happens to
> work."  It may be good enough for your purpose of making progress (after
> all, I'd imagine that you'd need to work this around one way or another to
> hunt for and fix more issues on the platform), but it does not sound like "we
> know what the problem is, and this is the best workaround for that", which is
> what we want if it wants to become part of the official codebase.

As I feared, the test suite was unable to reproduce the issue without setbuf(NULL) - primary because the test structure ends up with both ends of the git dialogs on clone and fetch in the same CPU (even if different IPUs), which does not experience the issue and we can't loop-back through the platform's proprietary SSH. I am not comfortable releasing without it at this stage, but if you don't want to go forward with this fix, my team can run it for a few months internally in the hope that this works out for the better. The situation is timing related and is fine 99.98-ish% of the time. I really do want the setbuf present in any compiled versions that our community might get, primarily because I don't like sleepless nights chasing this down (again).

Cheers,
Randall




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux