On Tue, Dec 14 2021, Jacob Vosmaer wrote: > When serving a fetch, git upload-pack copies data from a git > pack-objects stdout pipe to its stdout. This commit increases the size > of the buffer used for that copying from 8192 to 65515, the maximum > sideband-64k packet size. > > Previously, this buffer was allocated on the stack. Because the new > buffer size is nearly 64KB, we switch this to a heap allocation. > > On GitLab.com we use GitLab's pack-objects cache which does writes of > 65515 bytes. Because of the default 8KB buffer size, propagating these > cache writes requires 8 pipe reads and 8 pipe writes from > git-upload-pack, and 8 pipe reads from Gitaly (our Git RPC service). > If we increase the size of the buffer to the maximum Git packet size, > we need only 1 pipe read and 1 pipe write in git-upload-pack, and 1 > pipe read in Gitaly to transfer the same amount of data. In benchmarks > with a pure fetch and 100% cache hit rate workload we are seeing CPU > utilization reductions of over 30%. > > Signed-off-by: Jacob Vosmaer <jacob@xxxxxxxxxx> > --- > upload-pack.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/upload-pack.c b/upload-pack.c > index c78d55bc67..3b90fb69e6 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -194,7 +194,13 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data) > } > > struct output_state { > - char buffer[8193]; > + /* > + * We do writes no bigger than LARGE_PACKET_DATA_MAX - 1, because with > + * sideband-64k the band designator takes up 1 byte of space. Because > + * relay_pack_data keeps the last byte to itself, we make the buffer 1 > + * byte bigger than the intended maximum write size. > + */ > + char buffer[(LARGE_PACKET_DATA_MAX - 1) + 1]; Since X-1+1 = X shouldn't we just say X ? :) Maybe this fixup is better, maybe not: diff --git a/upload-pack.c b/upload-pack.c index 3b90fb69e6d..10849110229 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -195,12 +195,12 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data) struct output_state { /* - * We do writes no bigger than LARGE_PACKET_DATA_MAX - 1, because with - * sideband-64k the band designator takes up 1 byte of space. Because - * relay_pack_data keeps the last byte to itself, we make the buffer 1 - * byte bigger than the intended maximum write size. + * We do writes no bigger than LARGE_PACKET_DATA_MAX - 1, + * because with * sideband-64k the band designator takes up 1 + * byte of space (see relay_pack_data() below). So + * LARGE_PACKET_DATA_MAX ends up being the right size. */ - char buffer[(LARGE_PACKET_DATA_MAX - 1) + 1]; + char buffer[LARGE_PACKET_DATA_MAX]; int used; unsigned packfile_uris_started : 1; unsigned packfile_started : 1; > int used; > unsigned packfile_uris_started : 1; > unsigned packfile_started : 1; > @@ -269,7 +275,7 @@ static void create_pack_file(struct upload_pack_data *pack_data, > const struct string_list *uri_protocols) > { > struct child_process pack_objects = CHILD_PROCESS_INIT; > - struct output_state output_state = { { 0 } }; > + struct output_state *output_state = xcalloc(1, sizeof(struct output_state)); I don't know when/if we need to worry about 8k v.s. ~65k on the stack, especially as recv_sideband() has: char buf[LARGE_PACKET_MAX + 1]; But maybe our stack is already quite big here, I don't know... > char progress[128]; > char abort_msg[] = "aborting due to possible repository " > "corruption on the remote side."; > @@ -404,7 +410,7 @@ static void create_pack_file(struct upload_pack_data *pack_data, > } > if (0 <= pu && (pfd[pu].revents & (POLLIN|POLLHUP))) { > int result = relay_pack_data(pack_objects.out, > - &output_state, > + output_state, > pack_data->use_sideband, > !!uri_protocols); > > @@ -438,11 +444,12 @@ static void create_pack_file(struct upload_pack_data *pack_data, > } > > /* flush the data */ > - if (output_state.used > 0) { > - send_client_data(1, output_state.buffer, output_state.used, > + if (output_state->used > 0) { > + send_client_data(1, output_state->buffer, output_state->used, > pack_data->use_sideband); > fprintf(stderr, "flushed.\n"); > } > + free(output_state); > if (pack_data->use_sideband) > packet_flush(1); > return; But this looks fine either way. And yes, in reply to the question in the cover letter it's fine to ignore the memory leaks we have when we call die().