On 8 Mar 2022, at 18:09, John Cai wrote: > Hi Taylor, > > On 8 Mar 2022, at 17:30, Taylor Blau wrote: > >> On Tue, Mar 08, 2022 at 10:08:46PM +0000, John Cai via GitGitGadget wrote: >>> diff --git a/builtin/cat-file.c b/builtin/cat-file.c >>> index 7b3f42950ec..e2edba70b41 100644 >>> --- a/builtin/cat-file.c >>> +++ b/builtin/cat-file.c >>> @@ -351,6 +351,13 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d >>> } >>> } >>> >>> +static int print_default_format(char *buf, int len, struct expand_data *data) >>> +{ >>> + return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), >>> + type_name(data->type), >>> + (uintmax_t)*data->info.sizep); >>> +} >> >> Two small nits here. It looks like the indentation on the second and >> third lines is off a little bit, since we'd typically expect those to be >> indented to the same margin as the first argument to xsnprintf(). > > Thanks for bringing this up. I did have a question about indentation in this > case. for the second line, I did try to indent it to align with buf. I attempted > to do the same with the third line, but it's the ( that lines up with buf so > optically it looks a little off. > >> >> The other is that you're reading data->info.sizep by dereferencing it, >> but we know that it points to data->size. So I think there it makes >> sense to just read the value directly out of data->size, though note >> that you'll still need the cast to uintmax_t since you're formatting it >> with PRIuMAX. > > good point, I'll adjust this in the next version. > >> >>> + >>> /* >>> * If "pack" is non-NULL, then "offset" is the byte offset within the pack from >>> * which the object may be accessed (though note that we may also rely on >>> @@ -381,10 +388,16 @@ static void batch_object_write(const char *obj_name, >>> } >>> } >>> >>> - strbuf_reset(scratch); >>> - strbuf_expand(scratch, opt->format, expand_format, data); >>> - strbuf_addch(scratch, '\n'); >>> - batch_write(opt, scratch->buf, scratch->len); >>> + if (!opt->format) { >>> + char buf[1024]; >>> + int len = print_default_format(buf, 1024, data); >>> + batch_write(opt, buf, len); >> >> Just curious (and apologies if this was discussed earlier and I missed >> it), but: is there a reason that we have to use a scratch buffer here >> that is separate from the strbuf we already have allocated? >> >> That would avoid a large-ish stack variable, but it means that the two >> paths are a little more similar, and can share the batch_write call >> outside of the if/else statement. > > This was holdover code from before. Looks like the scratch buffer gets passed > in. Do you mean we don't need to allocate char buf[1024] and instead we can just > use scratch and pass it into print_default_format? something like this? diff --git a/builtin/cat-file.c b/builtin/cat-file.c index e2edba70b418..2336bcc80850 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -351,11 +351,11 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d } } -static int print_default_format(char *buf, int len, struct expand_data *data) +static void print_default_format(struct strbuf *scratch, struct expand_data *data) { - return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), - type_name(data->type), - (uintmax_t)*data->info.sizep); + strbuf_addf(scratch, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), + type_name(data->type), + (uintmax_t)data->size); } /* @@ -388,17 +388,17 @@ static void batch_object_write(const char *obj_name, } } + strbuf_reset(scratch); + if (!opt->format) { - char buf[1024]; - int len = print_default_format(buf, 1024, data); - batch_write(opt, buf, len); + print_default_format(scratch, data); } else { - strbuf_reset(scratch); strbuf_expand(scratch, opt->format, expand_format, data); strbuf_addch(scratch, '\n'); - batch_write(opt, scratch->buf, scratch->len); } + batch_write(opt, scratch->buf, scratch->len); + if (opt->print_contents) { print_object_or_die(opt, data); batch_write(opt, "\n", 1); > >> >> The rest of the changes in this file all look good to me. >> >>> diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh >>> new file mode 100755 >>> index 00000000000..e463623f5a3 >>> --- /dev/null >>> +++ b/t/perf/p1006-cat-file.sh >>> @@ -0,0 +1,16 @@ >>> +#!/bin/sh >>> + >>> +test_description='Basic sort performance tests' >> >> Is this description a hold-over from p0071? If so, it may be worth >> updating here. >> >>> +test_expect_success 'setup' ' >>> + git rev-list --all >rla >>> +' >>> + >>> +test_perf 'cat-file --batch-check' ' >>> + git cat-file --batch-check <rla >>> +' >> >> We could probably get away with dropping the setup test and using >> `--batch-all-objects` here. Note that right now you're only printing >> commit objects, so there would be a slight behavior change from the way >> the patch is currently written, but it should demonstrate the same >> performance improvement. > > This sounds good to me! > >> >> Thanks, >> Taylor