RE: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE

"Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> · Wed, 7 Jul 2021 19:22:32 -0400



On July 7, 2021 6:38 PM, Junio C Hamano wrote:
>To: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
>Cc: git@xxxxxxxxxxxxxxx; Jeff King <peff@xxxxxxxx>
>Subject: Re: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
>
>Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>
>> Change a couple of users of strbuf_init() that pass a hint of 8192 to
>> pass STRBUF_HINT_SIZE instead.
>>
>> Both of these hardcoded occurrences pre-date the use of the strbuf
>> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
>> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
>> 2007-09-06).
>>
>> In both cases the exact choice of 8192 is rather arbitrary, e.g. for
>> commit buffers I think 1024 or 2048 would probably be a better default
>> (this commit message is getting this commit close to the former, but I
>> daresay it's already way above the average for git commits).
>
>Yes, they are arbitrary within the context of these callers.
>
>I do not think using STRBUF_HINT_SIZE macro in them is the right thing to do at all, as there is no reason to think that the best value for
>the write chunk sizes in these codepath has any linkage to the best value for the read chunk sizes used by strbuf_read() at all.  When
>benchmarking reveals that the best default size for
>strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how do you tell that it also happens to be the best write buffer size for
>the cache-tree writeout codepath (answer: you don't)?

And benchmark results are going to be highly platform dependent, as we have seen with our exotic platform.
-Randall