Re: [PATCH v2] cat-file: skip expanding default format

John Cai <johncai86@xxxxxxxxx> · Tue, 08 Mar 2022 17:06:41 -0500

Hi Taylor

On 8 Mar 2022, at 17:00, Taylor Blau wrote:

> On Tue, Mar 08, 2022 at 02:54:23AM +0000, John Cai via GitGitGadget wrote:
>>  /*
>>   * If "pack" is non-NULL, then "offset" is the byte offset within the pack from
>>   * which the object may be accessed (though note that we may also rely on
>> @@ -363,6 +371,11 @@ static void batch_object_write(const char *obj_name,
>>  			       struct packed_git *pack,
>>  			       off_t offset)
>>  {
>> +	struct strbuf type_name = STRBUF_INIT;
>> +
>> +	if (!opt->format)
>> +		data->info.type_name = &type_name;
>
> Hmmm, I'm not quite sure I understand why the extra string buffer is
> necessary here. When we do the first expansion with the DEFAULT_FORMAT,
> we set data->info.typep to point at data->type.
>
> So by the time we do our actual query (either with
> `packed_object_info()` or just `oid_object_info_extended()`), we will
> get the type filled into data->type.
>
> And we should be able to use that in print_default_format(), which saves
> us in a couple of ways:
>
>   - We don't have to (stack) allocate a string buffer, which then needs
>     to be cleaned up after printing each object.
>
>   - (More importantly) we can avoid the extra string copy through the
>     intermediate buffer by using type_name() directly.

Agree with your reasoning here.

>
> Here's a small patch on top that you could apply, if you're interested:
>
> --- >8 ---
>
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index ab9a49e13a..9f55afa75b 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -355,5 +355,5 @@ static int print_default_format(char *buf, int len, struct expand_data *data)
>  {
>  	return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
> -		 data->info.type_name->buf,
> +			 type_name(data->type),
>  		 (uintmax_t)*data->info.sizep);
>
> @@ -372,9 +372,4 @@ static void batch_object_write(const char *obj_name,
>  			       off_t offset)
>  {
> -	struct strbuf type_name = STRBUF_INIT;
> -
> -	if (!opt->format)
> -		data->info.type_name = &type_name;
> -
>  	if (!data->skip_object_info) {
>  		int ret;
> @@ -391,5 +386,5 @@ static void batch_object_write(const char *obj_name,
>  			       obj_name ? obj_name : oid_to_hex(&data->oid));
>  			fflush(stdout);
> -			goto cleanup;
> +			return;
>  		}
>  	}
> @@ -410,7 +405,4 @@ static void batch_object_write(const char *obj_name,
>  		batch_write(opt, "\n", 1);
>  	}
> -
> -cleanup:
> -	strbuf_release(&type_name);
>  }
>
> --- 8< ---
>
> On my copy of git.git., it shaves off around ~7ms that we're spending
> just copying type names back and forth.
>
>     (without the above)
>     Benchmark 1: git.compile cat-file --batch-check --batch-all-objects
>       Time (mean ± σ):     578.7 ms ±  12.7 ms    [User: 553.4 ms, System: 25.2 ms]
>       Range (min … max):   568.1 ms … 600.1 ms    10 runs
>
>     (with the above)
>     Benchmark 1: git.compile cat-file --batch-check --batch-all-objects
>       Time (mean ± σ):     571.5 ms ±   7.9 ms    [User: 544.0 ms, System: 27.3 ms]
>       Range (min … max):   558.8 ms … 583.2 ms    10 runs

Thanks for this suggestion and the benchmark! This is similar to what Junio suggested, and
I do think this is the right thing to do here.

>
> Thanks,
> Taylor

Thanks,
John