On Tue, Nov 07, 2023 at 01:25:59AM +0000, Victoria Dye via GitGitGadget wrote: > From: Victoria Dye <vdye@xxxxxxxxxx> > > Update 'filter_and_format_refs()' to try to perform ref filtering & > formatting in a single ref iteration, without an intermediate 'struct > ref_array'. This can only be done if no operations need to be performed on a > pre-filtered array; specifically, if the refs are > > - filtered on reachability, > - sorted, or > - formatted with ahead-behind information > > they cannot be filtered & formatted in the same iteration. In that case, > fall back on the current filter-then-sort-then-format flow. > > This optimization substantially improves memory usage due to no longer > storing a ref array in memory. In some cases, it also dramatically reduces > runtime (e.g. 'git for-each-ref --no-sort --count=1', which no longer loads > all refs into a 'struct ref_array' to printing only the first ref). > > Signed-off-by: Victoria Dye <vdye@xxxxxxxxxx> > --- > ref-filter.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 74 insertions(+), 6 deletions(-) > > diff --git a/ref-filter.c b/ref-filter.c > index ff00ab4b8d8..384cf1595ff 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -2863,6 +2863,44 @@ static void free_array_item(struct ref_array_item *item) > free(item); > } > > +struct ref_filter_and_format_cbdata { > + struct ref_filter *filter; > + struct ref_format *format; > + > + struct ref_filter_and_format_internal { > + int count; > + } internal; > +}; > + > +static int filter_and_format_one(const char *refname, const struct object_id *oid, int flag, void *cb_data) > +{ > + struct ref_filter_and_format_cbdata *ref_cbdata = cb_data; > + struct ref_array_item *ref; > + struct strbuf output = STRBUF_INIT, err = STRBUF_INIT; > + > + ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter); > + if (!ref) > + return 0; > + > + if (format_ref_array_item(ref, ref_cbdata->format, &output, &err)) > + die("%s", err.buf); > + > + if (output.len || !ref_cbdata->format->array_opts.omit_empty) { > + fwrite(output.buf, 1, output.len, stdout); > + putchar('\n'); > + } > + > + strbuf_release(&output); > + strbuf_release(&err); > + free_array_item(ref); > + > + if (ref_cbdata->format->array_opts.max_count && > + ++ref_cbdata->internal.count >= ref_cbdata->format->array_opts.max_count) > + return -1; It feels a bit weird to return a negative value here, which usually indicates that an error has happened whereas we only use it here to abort the iteration. But we ignore the return value of `do_iterate_refs()` anyway, so it doesn't make much of a difference. > + return 0; > +} > + > /* Free all memory allocated for ref_array */ > void ref_array_clear(struct ref_array *array) > { > @@ -3046,16 +3084,46 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int > return ret; > } > > +static inline int can_do_iterative_format(struct ref_filter *filter, > + struct ref_sorting *sorting, > + struct ref_format *format) > +{ > + /* > + * Refs can be filtered and formatted in the same iteration as long > + * as we aren't filtering on reachability, sorting the results, or > + * including ahead-behind information in the formatted output. > + */ Do we want to format this as a bulleted list so that it's more readily extensible if we ever need to pay attention to new options here? Also, I noted that this commit doesn't add any new tests -- do we already exercise all of these conditions? More generally, I worry a bit about maintainability of this code snippet as we need to remember to always update this condition whenever we add a new option, and this can be quite easy to miss. The performance benefit might be worth the effort though. Patrick > + return !(filter->reachable_from || > + filter->unreachable_from || > + sorting || > + format->bases.nr); > +} > + > void filter_and_format_refs(struct ref_filter *filter, unsigned int type, > struct ref_sorting *sorting, > struct ref_format *format) > { > - struct ref_array array = { 0 }; > - filter_refs(&array, filter, type); > - filter_ahead_behind(the_repository, format, &array); > - ref_array_sort(sorting, &array); > - print_formatted_ref_array(&array, format); > - ref_array_clear(&array); > + if (can_do_iterative_format(filter, sorting, format)) { > + int save_commit_buffer_orig; > + struct ref_filter_and_format_cbdata ref_cbdata = { > + .filter = filter, > + .format = format, > + }; > + > + save_commit_buffer_orig = save_commit_buffer; > + save_commit_buffer = 0; > + > + do_filter_refs(filter, type, filter_and_format_one, &ref_cbdata); > + > + save_commit_buffer = save_commit_buffer_orig; > + } else { > + struct ref_array array = { 0 }; > + filter_refs(&array, filter, type); > + filter_ahead_behind(the_repository, format, &array); > + ref_array_sort(sorting, &array); > + print_formatted_ref_array(&array, format); > + ref_array_clear(&array); > + } > } > > static int compare_detached_head(struct ref_array_item *a, struct ref_array_item *b) > -- > gitgitgadget > >
Attachment:
signature.asc
Description: PGP signature