Re: [PATCH v3 0/2] send_ref buffering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 31, 2021 at 03:08:25PM +0200, Jacob Vosmaer wrote:

> > Does the 64k buffer actually improve things? Here are the timings I get
> > on a repo with ~1M refs (it's linux.git with one ref per commit).
> Thanks for challenging that. I have a repeatable benchmark where it
> matters, because each write syscall wakes up a chain of proxies
> between the user and git-upload-pack. Larger buffers means fewer
> wake-ups. But then I tried to simplify my example by having sshd as
> the only intermediary, and in that experiment 64K buffers were not
> better than 4K buffers. I think that goes to show that picking a good
> buffer size is hard, and we'd be better off picking one specifically
> for Gitaly (and GitLab) that works with our stack.

Thanks for explaining. Yeah, I think leaving it as a custom thing makes
sense, then.

> >   Summary
> >     'GIT_REF_PARANOIA=1 git.compile upload-pack .' ran
> >       2.17 ± 0.02 times faster than 'git.compile upload-pack .'
> >
> > It's not exactly the intended use of that environment variable, but its
> > side effect is that we do not call has_object_file() on each ref tip.
> That is nice to know, but as a user of Git I don't know when it is or
> is not safe to skip those has_object_file() calls. If it's safe to
> skip them then Git should skip them always. If not, then I will err on
> the side of caution and keep the checks.

Yeah, the use of REF_PARANOIA there was just an easy illustration. IMHO
it would be reasonable for upload-pack to just assume that the refs
files are valid. If they aren't, then either:

  - the receiver is uninterested in those objects or already has them,
    so won't ask for them. They're happy either way.

  - the receiver _will_ ask for them, at which point we'd barf later in
    pack-objects when we try to access them.

There are some thoughts in this old thread which introduce
GIT_REF_PARANOIA:

  https://lore.kernel.org/git/20150317073730.GA25267@xxxxxxxx/

I think I was mostly too cowardly to make the change back then. And I
hadn't considered that the performance implications would be all that
big (though I will say this million-ref repo is at the edge of what I'd
consider reasonable).

> > > I do think it would be nice to take the packet_writer
> > > interface further (letting it replace the static buf, and use stdio
> > > handles, and using it throughout upload-pack).
> > I would like that too, for the sake of neatness and general
> > performance, but I don't have the time to take on a larger project
> > like that at the moment.
> I gave solving the problem with packet_writer a couple of hours today.
> The diff gets too big, and I have too little confidence I'm not
> introducing deadlocks. This really is more work than I can chew off
> right now. Sorry!

Thanks for taking a look! I think we can proceed with your series for
now, then.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux