Re: [PATCH] read-cache: make the index write buffer size 128K

Neeraj Singh <nksingh85@xxxxxxxxx> · Fri, 19 Feb 2021 23:56:54 -0800

On Fri, Feb 19, 2021 at 11:46 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> writes:
>
> > On 2/17/21 9:48 PM, Neeraj K. Singh via GitGitGadget wrote:
> >> From: Neeraj Singh <neerajsi@xxxxxxxxxxxxxxxxxxx>
> >> Writing an index 8K at a time invokes the OS filesystem and caching
> >> code
> >> very frequently, introducing noticeable overhead while writing large
> >> indexes. When experimenting with different write buffer sizes on Windows
> >> writing the Windows OS repo index (260MB), most of the benefit came by
> >> bumping the index write buffer size to 64K. I picked 128K to ensure that
> >> we're past the knee of the curve.
> >> With this change, the time under do_write_index for an index with 3M
> >> files goes from ~1.02s to ~0.72s.
> >
> > [...]
> >
> >>   -#define WRITE_BUFFER_SIZE 8192
> >> +#define WRITE_BUFFER_SIZE (128 * 1024)
> >>   static unsigned char write_buffer[WRITE_BUFFER_SIZE];
> >>   static unsigned long write_buffer_len;
> >
> > [...]
> >
> > Very nice.
>
> I wonder if we gain more by going say 4M buffer size or even larger?
>
> Is this something we can make the system auto-tune itself?  This is
> not about reading but writing, so we already have enough information
> to estimate how much we would need to write out.
>
> Thanks.
>

Hi Junio,
At some point the cost of the memcpy into the filesystem cache begins to
dominate the cost of the system call, so increasing the buffer size
has diminishing returns.

An alternate approach would be to mmap the index file we are trying to
write and thereby
copy the data directly into the filesystem cache pages.  That's a much
more difficult change to
make and verify, so I'd rather leave that as an exercise to the reader
for now :).

Thanks,
-Neeraj