Re: [PATCH v2 1/1] upload-pack.c: increase output buffer size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 14 2021, Jacob Vosmaer wrote:

> When serving a fetch, git upload-pack copies data from a git
> pack-objects stdout pipe to its stdout. This commit increases the size
> of the buffer used for that copying from 8192 to 65515, the maximum
> sideband-64k packet size.
>
> Previously, this buffer was allocated on the stack. Because the new
> buffer size is nearly 64KB, we switch this to a heap allocation.
>
> On GitLab.com we use GitLab's pack-objects cache which does writes of
> 65515 bytes. Because of the default 8KB buffer size, propagating these
> cache writes requires 8 pipe reads and 8 pipe writes from
> git-upload-pack, and 8 pipe reads from Gitaly (our Git RPC service).
> If we increase the size of the buffer to the maximum Git packet size,
> we need only 1 pipe read and 1 pipe write in git-upload-pack, and 1
> pipe read in Gitaly to transfer the same amount of data. In benchmarks
> with a pure fetch and 100% cache hit rate workload we are seeing CPU
> utilization reductions of over 30%.
>
> Signed-off-by: Jacob Vosmaer <jacob@xxxxxxxxxx>
> ---
>  upload-pack.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/upload-pack.c b/upload-pack.c
> index c78d55bc67..3b90fb69e6 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -194,7 +194,13 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
>  }
>  
>  struct output_state {
> -	char buffer[8193];
> +	/*
> +	 * We do writes no bigger than LARGE_PACKET_DATA_MAX - 1, because with
> +	 * sideband-64k the band designator takes up 1 byte of space. Because
> +	 * relay_pack_data keeps the last byte to itself, we make the buffer 1
> +	 * byte bigger than the intended maximum write size.
> +	 */
> +	char buffer[(LARGE_PACKET_DATA_MAX - 1) + 1];

Since X-1+1 = X shouldn't we just say X ? :)

Maybe this fixup is better, maybe not:
	
	diff --git a/upload-pack.c b/upload-pack.c
	index 3b90fb69e6d..10849110229 100644
	--- a/upload-pack.c
	+++ b/upload-pack.c
	@@ -195,12 +195,12 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
	 
	 struct output_state {
	 	/*
	-	 * We do writes no bigger than LARGE_PACKET_DATA_MAX - 1, because with
	-	 * sideband-64k the band designator takes up 1 byte of space. Because
	-	 * relay_pack_data keeps the last byte to itself, we make the buffer 1
	-	 * byte bigger than the intended maximum write size.
	+	 * We do writes no bigger than LARGE_PACKET_DATA_MAX - 1,
	+	 * because with * sideband-64k the band designator takes up 1
	+	 * byte of space (see relay_pack_data() below). So
	+	 * LARGE_PACKET_DATA_MAX ends up being the right size.
	 	 */
	-	char buffer[(LARGE_PACKET_DATA_MAX - 1) + 1];
	+	char buffer[LARGE_PACKET_DATA_MAX];
	 	int used;
	 	unsigned packfile_uris_started : 1;
	 	unsigned packfile_started : 1;

>  	int used;
>  	unsigned packfile_uris_started : 1;
>  	unsigned packfile_started : 1;
> @@ -269,7 +275,7 @@ static void create_pack_file(struct upload_pack_data *pack_data,
>  			     const struct string_list *uri_protocols)
>  {
>  	struct child_process pack_objects = CHILD_PROCESS_INIT;
> -	struct output_state output_state = { { 0 } };
> +	struct output_state *output_state = xcalloc(1, sizeof(struct output_state));

I don't know when/if we need to worry about 8k v.s. ~65k on the stack,
especially as recv_sideband() has:

	char buf[LARGE_PACKET_MAX + 1];

But maybe our stack is already quite big here, I don't know...

>  	char progress[128];
>  	char abort_msg[] = "aborting due to possible repository "
>  		"corruption on the remote side.";
> @@ -404,7 +410,7 @@ static void create_pack_file(struct upload_pack_data *pack_data,
>  		}
>  		if (0 <= pu && (pfd[pu].revents & (POLLIN|POLLHUP))) {
>  			int result = relay_pack_data(pack_objects.out,
> -						     &output_state,
> +						     output_state,
>  						     pack_data->use_sideband,
>  						     !!uri_protocols);
>  
> @@ -438,11 +444,12 @@ static void create_pack_file(struct upload_pack_data *pack_data,
>  	}
>  
>  	/* flush the data */
> -	if (output_state.used > 0) {
> -		send_client_data(1, output_state.buffer, output_state.used,
> +	if (output_state->used > 0) {
> +		send_client_data(1, output_state->buffer, output_state->used,
>  				 pack_data->use_sideband);
>  		fprintf(stderr, "flushed.\n");
>  	}
> +	free(output_state);
>  	if (pack_data->use_sideband)
>  		packet_flush(1);
>  	return;

But this looks fine either way. And yes, in reply to the question in the
cover letter it's fine to ignore the memory leaks we have when we call
die().



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux