Re: Problem accessing git.kernel.org with git v2.33 plus gitproxy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 30, 2021 at 09:28:45PM +0300, Kirill A. Shutemov wrote:

> > I can't reproduce the problem here, using core.gitproxy with a script
> > identical to what you showed above. I tried both cloning, and fetching
> > via both git-fetch and git-fetch-pack.
> 
> Could you try with a kernel repo?
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> 
> I found that not all repos on kernel.org trigger the issue.

Thanks, I was able to reproduce there (but not with git.git). That makes
me wonder if it's a race condition of some sort. My reproduction was
just:

  git init
  git config core.gitproxy /path/to/proxy/script
  git fetch-pack \
    git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git \
    refs/heads/master

In the meantime, a workaround is:

  git -c protocol.version=0 fetch-pack ...

> > If you can reproduce it at will and it fails on 2.33 but not earlier,
> > then bisecting might be helpful.
> 
> I did. See my other mail.

Yeah, looks like you found ae1a7eefff (fetch-pack: signal v2 server that
we are done making requests, 2021-05-19). I suspected that might be the
case.

I strace'd the underlying socat, and it does this (numbers on the left
are my annotations):

     select(6, [0 5], [1], [], NULL)         = 2 (in [0], out [1])
     recvfrom(3, 0x7ffee69f1f50, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
     recvfrom(3, 0x7ffee69f19d0, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[0]  read(0, "", 8192)                       = 0
     recvfrom(3, 0x7ffee69f19d0, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[1]  shutdown(5, SHUT_WR)                    = 0
     recvfrom(3, 0x7ffee69f1f50, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[2]  select(6, [5], [], [], {tv_sec=0, tv_usec=500000}) = 0 (Timeout)
     recvfrom(3, 0x7ffee69f1f50, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[3]  shutdown(5, SHUT_RDWR)                  = 0
     recvfrom(3, 0x7ffee69f2240, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
     exit_group(0)                           = ?

Here descriptors 0/1 are the pipes from/to Git, and 5 is the TCP socket
connected to the server. The recvfrom() is just noise, I think; socat
opens a dgram socketpair(), but doesn't seem to do anything with it.

So in [0] we see that Git has hung up half of the pipe, due to the new
code in ae1a7eefff. socat then correctly relays the half-duplex shutdown
to the server in [1]. At this point it should wait for the server to
send the data, and relay it to stdout. And indeed, it does call select()
in [2]. But then when it hits the half-second timeout, it shuts down
completely!

I'm not that familiar with socat, but I've seen the same thing with
older versions of netcat: it wants to quit after seeing EOF on stdin.
This is useful to prevent deadlock if the server doesn't respond to a
half-duplex shutdown. But it's quite the wrong thing to do for a more
intelligent protocol.

That explains why you see the problem sometimes but not others. It
depends how long the server takes before it produces any output, which
in turn may depend on things like repo size. You said you didn't see it
when fetching from GitHub, but I suspect it is simply that GitHub's
server responds a little bit more quickly.

In netcat, the fix is to use "-q" (though at least some versions of
netcat will wait forever by default these days, so it's not a problem).
In socat, it looks like "-t" does the same thing. And indeed, switching
the proxy to:

  socat -t 10 - "TCP:$1:$2"

makes the problem go away for me. The 10-second timeout might seem
arbitrary, but it should be reliable. Git's server-side has a keep-alive
mechanism that sends a packet every 5 seconds, even if no output has
been produced yet. So even if the request takes a long time to generate
any output, it should be plenty to tell socat that the connection is
still live.

I am sympathetic that this used to work, and now doesn't. But this proxy
case is affected by the problem that ae1a7eefff was solving. The root of
the issue is just that "socat" in its default form is not doing the
right thing. So I'd prefer not to try to make any change to Git's
behavior here. But one option would be to limit it only to ssh, and not
git:// proxies (we already don't do that half-duplex shutdown for raw
TCP git://, for reasons discussed in that commit message).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux