Re: [PATCH v2 2/4] csum-file.h: increase hashfile buffer size

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 18 May 2021 06:54:57 +0900

"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>
> The hashfile API uses a hard-coded buffer size of 8KB and has ever since
> it was introduced in c38138c (git-pack-objects: write the pack files
> with a SHA1 csum, 2005-06-26). It performs a similar function to the
> hashing buffers in read-cache.c, but that code was updated from 8KB to
> 128KB in f279894 (read-cache: make the index write buffer size 128K,
> 2021-02-18). The justification there was that do_write_index() improves
> from 1.02s to 0.72s.
>
> There is a buffer, check_buffer, that is used to verify the check_fd
> file descriptor. When this buffer increases to 128K to fit the data
> being flushed, it causes the stack to overflow the limits placed in the
> test suite. By moving this to a static buffer, we stop using stack data
> for this purpose, but we lose some thread-safety. This change makes it
> unsafe to write to multiple hashfiles across different threads.

So... is the longer term plan to heap-allocate the buffer itself,
and replace the .buffer member of hashfile from an embedded char
array to a pointer?  The hashfile structure is initialized once per
the entire file (like the on-disk index), and the same buffer will
be used throughout the life of that hashfile instance, so it may not
be too bad to allocate on the heap.

Just after the previous step justified its simplification of its
progress logic based on how small the buffer is, this step makes it
16 times as big, which felt a tiny bit dishonest.  We probably
should say somewhere that 128k is still small enough that the
rewrite in the previous step is still valid ;-)