On Tue, Aug 31, 2021 at 03:08:25PM +0200, Jacob Vosmaer wrote: > > Does the 64k buffer actually improve things? Here are the timings I get > > on a repo with ~1M refs (it's linux.git with one ref per commit). > Thanks for challenging that. I have a repeatable benchmark where it > matters, because each write syscall wakes up a chain of proxies > between the user and git-upload-pack. Larger buffers means fewer > wake-ups. But then I tried to simplify my example by having sshd as > the only intermediary, and in that experiment 64K buffers were not > better than 4K buffers. I think that goes to show that picking a good > buffer size is hard, and we'd be better off picking one specifically > for Gitaly (and GitLab) that works with our stack. Thanks for explaining. Yeah, I think leaving it as a custom thing makes sense, then. > > Summary > > 'GIT_REF_PARANOIA=1 git.compile upload-pack .' ran > > 2.17 ± 0.02 times faster than 'git.compile upload-pack .' > > > > It's not exactly the intended use of that environment variable, but its > > side effect is that we do not call has_object_file() on each ref tip. > That is nice to know, but as a user of Git I don't know when it is or > is not safe to skip those has_object_file() calls. If it's safe to > skip them then Git should skip them always. If not, then I will err on > the side of caution and keep the checks. Yeah, the use of REF_PARANOIA there was just an easy illustration. IMHO it would be reasonable for upload-pack to just assume that the refs files are valid. If they aren't, then either: - the receiver is uninterested in those objects or already has them, so won't ask for them. They're happy either way. - the receiver _will_ ask for them, at which point we'd barf later in pack-objects when we try to access them. There are some thoughts in this old thread which introduce GIT_REF_PARANOIA: https://lore.kernel.org/git/20150317073730.GA25267@xxxxxxxx/ I think I was mostly too cowardly to make the change back then. And I hadn't considered that the performance implications would be all that big (though I will say this million-ref repo is at the edge of what I'd consider reasonable). > > > I do think it would be nice to take the packet_writer > > > interface further (letting it replace the static buf, and use stdio > > > handles, and using it throughout upload-pack). > > I would like that too, for the sake of neatness and general > > performance, but I don't have the time to take on a larger project > > like that at the moment. > I gave solving the problem with packet_writer a couple of hours today. > The diff gets too big, and I have too little confidence I'm not > introducing deadlocks. This really is more work than I can chew off > right now. Sorry! Thanks for taking a look! I think we can proceed with your series for now, then. -Peff