Re: git push over HTTP; long delay with no progress, then hang?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 15, 2020 at 09:09:27PM -0700, Bryan Turner wrote:
> When running a huge "git push" via protocol v0/v1 over HTTP

By huge push you mean a lot of refs?

> (repository is ~10GB, with ~104,000 refs), I observe that:
> * Git makes an initial connection for a ref advertisement. This
> completes almost instantly because the repository is empty
> * "git push" then sits in absolute silence for ~10 minutes

I've run into this a few years ago, remember waiting for 57 minutes ;)

> The process chain looks like:
> git push <URL>
>     git-remote-http <URL> <URL>
>         git send-pack --stateless-rpc --helper-status --thin
> --progress <URL> --stdin
> 
> The "git send-pack" process runs at 100% usage for a single CPU core
> for this entire duration. Does anyone have any insight into what Git
> might be doing during this long delay?

Pathspec matching is, if I recall correctly,

  O(nr of refspecs * (nr of local refs + nr of remote refs))

with remote.c:count_refspec_match() responsible the "nr of remote +
local refs" part and remote.c:match_explicit_refs() for the "nr of
refspecs" part.

This is particularly bad for http/https protocols, because 'git push'
expands your refspecs to fully qualified refspecs, passes them to 'git
send-pack', which then performs pathspec matching _again_.  So if you
have a single pathspec with globbing, then 'git push' can do the
pathspec matching still fairly quickly, even if there are a lot of
local and remote refs and if that single globbing pathspec happens to
match a lot of refs, but then the refspec matching in 'git send-pack'
has a whole lot to do, spins the CPU like crazy, and there you are
writing a bug report on Friday evening.

This is less of an issue with other protocols, because they perform
pathspec matching only once, but of course all protocols suffer if you
pass a lot of refspecs to 'git push' or 'git send-pack'.

> Whatever it is, is it perhaps
> something Git should actually print some sort of status for? (I've
> reproduced this long silence with both Git 2.20.1 and the new Git
> 2.27.0-rc0.)

An immediate band-aid might be to teach 'git push' to pass on the
original refspecs to 'git send-pack', as this would reduce the
complexity of that second pathspec matching.  This, of course,
wouldn't help if someone scripted around 'git push' and invoked it
with a lot of refspecs or fed lot of refspecs directly to 'git
send-pack's stdin.

Alternatively, teach 'git send-pack' a new option e.g.
'--only-fully-qualified-refspecs', and teach 'git push' to use it, so
'git send-pack' doesn't have to perform that second pathspec matching,
it would only have to verify that the refspecs it got are indeed all
fully qualified.

Or build the remote refs index earlier and sort refspecs and local
refs, so we could match the lhs of fully qualified refspecs to local
refs in one go while looking up their rhs in the remote ref index,
resulting in O((nr of refspecs + nr of local refs) * log(nr of remote
refs) complexity.  Dunno, it was a long time ago when I last thought
about this.

All this assumes that if there are a lot of refspecs, then they are
fully qualified.  I'd assume that if there are so many refspecs to
cause trouble, then they were generated programmatically, and I'd
(naively? :) assume that if something generates refspecs, then it's
careful and generates fully qualified refspecs.  Anyway, all bets are
off if there are a lot of non-fully-qualified refspecs...



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux