Re: [PATCH v3 0/2] send_ref buffering

Jacob Vosmaer <jacob@xxxxxxxxxx> · Tue, 31 Aug 2021 15:08:25 +0200



On Tue, Aug 31, 2021 at 12:25 PM Jeff King <peff@xxxxxxxx> wrote:
> I do think it would be nice to take the packet_writer
> interface further (letting it replace the static buf, and use stdio
> handles, and using it throughout upload-pack).
I would like that too, for the sake of neatness and general
performance, but I don't have the time to take on a larger project
like that at the moment.

> Does the 64k buffer actually improve things? Here are the timings I get
> on a repo with ~1M refs (it's linux.git with one ref per commit).
Thanks for challenging that. I have a repeatable benchmark where it
matters, because each write syscall wakes up a chain of proxies
between the user and git-upload-pack. Larger buffers means fewer
wake-ups. But then I tried to simplify my example by having sshd as
the only intermediary, and in that experiment 64K buffers were not
better than 4K buffers. I think that goes to show that picking a good
buffer size is hard, and we'd be better off picking one specifically
for Gitaly (and GitLab) that works with our stack.

>   Summary
>     'GIT_REF_PARANOIA=1 git.compile upload-pack .' ran
>       2.17 ± 0.02 times faster than 'git.compile upload-pack .'
>
> It's not exactly the intended use of that environment variable, but its
> side effect is that we do not call has_object_file() on each ref tip.
That is nice to know, but as a user of Git I don't know when it is or
is not safe to skip those has_object_file() calls. If it's safe to
skip them then Git should skip them always. If not, then I will err on
the side of caution and keep the checks.

Jacob