On Fri, May 15, 2020 at 09:09:27PM -0700, Bryan Turner wrote: > When running a huge "git push" via protocol v0/v1 over HTTP By huge push you mean a lot of refs? > (repository is ~10GB, with ~104,000 refs), I observe that: > * Git makes an initial connection for a ref advertisement. This > completes almost instantly because the repository is empty > * "git push" then sits in absolute silence for ~10 minutes I've run into this a few years ago, remember waiting for 57 minutes ;) > The process chain looks like: > git push <URL> > git-remote-http <URL> <URL> > git send-pack --stateless-rpc --helper-status --thin > --progress <URL> --stdin > > The "git send-pack" process runs at 100% usage for a single CPU core > for this entire duration. Does anyone have any insight into what Git > might be doing during this long delay? Pathspec matching is, if I recall correctly, O(nr of refspecs * (nr of local refs + nr of remote refs)) with remote.c:count_refspec_match() responsible the "nr of remote + local refs" part and remote.c:match_explicit_refs() for the "nr of refspecs" part. This is particularly bad for http/https protocols, because 'git push' expands your refspecs to fully qualified refspecs, passes them to 'git send-pack', which then performs pathspec matching _again_. So if you have a single pathspec with globbing, then 'git push' can do the pathspec matching still fairly quickly, even if there are a lot of local and remote refs and if that single globbing pathspec happens to match a lot of refs, but then the refspec matching in 'git send-pack' has a whole lot to do, spins the CPU like crazy, and there you are writing a bug report on Friday evening. This is less of an issue with other protocols, because they perform pathspec matching only once, but of course all protocols suffer if you pass a lot of refspecs to 'git push' or 'git send-pack'. > Whatever it is, is it perhaps > something Git should actually print some sort of status for? (I've > reproduced this long silence with both Git 2.20.1 and the new Git > 2.27.0-rc0.) An immediate band-aid might be to teach 'git push' to pass on the original refspecs to 'git send-pack', as this would reduce the complexity of that second pathspec matching. This, of course, wouldn't help if someone scripted around 'git push' and invoked it with a lot of refspecs or fed lot of refspecs directly to 'git send-pack's stdin. Alternatively, teach 'git send-pack' a new option e.g. '--only-fully-qualified-refspecs', and teach 'git push' to use it, so 'git send-pack' doesn't have to perform that second pathspec matching, it would only have to verify that the refspecs it got are indeed all fully qualified. Or build the remote refs index earlier and sort refspecs and local refs, so we could match the lhs of fully qualified refspecs to local refs in one go while looking up their rhs in the remote ref index, resulting in O((nr of refspecs + nr of local refs) * log(nr of remote refs) complexity. Dunno, it was a long time ago when I last thought about this. All this assumes that if there are a lot of refspecs, then they are fully qualified. I'd assume that if there are so many refspecs to cause trouble, then they were generated programmatically, and I'd (naively? :) assume that if something generates refspecs, then it's careful and generates fully qualified refspecs. Anyway, all bets are off if there are a lot of non-fully-qualified refspecs...