Re: [PATCH 2/2] upload-pack: use stdio in send_ref callbacks

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 27 Aug 2021 09:15:10 -0700

Jeff King <peff@xxxxxxxx> writes:

> On Thu, Aug 26, 2021 at 09:33:08AM -0700, Junio C Hamano wrote:
>
>> > +	/*
>> > +	 * Increase the stdio buffer size for stdout, for the benefit of ref
>> > +	 * advertisement writes. We are only allowed to call setvbuf(3) "after
>> > +	 * opening a stream and before any other operations have been performed
>> > +	 * on it", so let's call it before we have written anything to stdout.
>> > +	 */
>> > +	if (setvbuf(stdout, xmalloc(LARGE_PACKET_MAX), _IOFBF,
>> > +			LARGE_PACKET_MAX))
>> > +		die_errno("failed to grow stdout buffer");
>> 
>> Nice to see a comment on the tricky part.  I do not think we mind if
>> we rounded up the allocation size to the next power of two here, but
>> there probably won't be any measurable benefit for doing so.
>
> I'm a little negative on this part, actually. The commit message claims
> it's for consistency across platforms. But I would argue that if your
> libc buffer size is sub-optimal, then we shouldn't be sprinkling these
> adjustments through the code. Either:
>
>  - we should consider it a quality-of-implementation issue, and people
>    using that libc should push back on their platform to change the
>    default size; or
>
>  - we should fix it consistently and transparently throughout Git by
>    adjusting std{in,out,err} in common-main, and using fopen()/fdopen()
>    wrappers to adjust to something more sensible

I agree that it would be ideal if we didn't do setvbuf() at all, the
second best would be to do so at a central place.  My "Nice" applied
only to the comment ;-)

> I also find the use of LARGE_PACKET_MAX weird here. Is that really the
> optimal stdio buffer size? The whole point of this is coalescing _small_
> packets into a single write() syscall. So how many small packets fit
> into LARGE_PACKET_MAX is comparing apples and oranges.

The sizing is all my fault.  The original used 32k threashold and
implemented manual buffering by flushing whenever the accumulated
data exceeded the threashold, as opposed to "if we buffer this new
piece, it would exceed, so let's flush first what we got so far,
which is smaller than the threashold", which I found indefensible in
two ways.  The "flush _after_ we go over it" semantics looked iffy,
and 32k was totally out of thin air.  As LARGE_PACKET_MAX is the
hard upper limit of each packet that has been with us forever, it
was more defensible than 32k ;-)

But if we are using stdio, I agree that it is much better not to
worry about sizing at all by not doing setvbuf() and leaving it to
libc implementation.  They ought to know what works on their
platform the best.

Thanks.