> Allow combining filters such that only objects accepted by all filters > are shown. The motivation for this is to allow getting directory > listings without also fetching blobs. This can be done by combining > blob:none with tree:<depth>. There are massive repositories that have > larger-than-expected trees - even if you include only a single commit. First of all, patches 2 and 3 are straightforward and LGTM. On to patch 4... [snip] > The current usage requires passing the filter to rev-list in the > following form: > > --filter=<FILTER1> --filter=<FILTER2> ... > > Such usage is currently an error, so giving it a meaning is backwards- > compatible. > > The URL-encoding scheme is being introduced before the repeated flag > logic, and the user-facing documentation for URL-encoding is being > withheld until the repeated flag feature is implemented. The > URL-encoding is in general not meant to be used directly by the user, > and it is better to describe the URL-encoding feature in terms of the > repeated flag. As of this commit, we don't support such arguments passed to rev-list in this way, so I would write these paragraphs as: A combined filter supports any number of subfilters, and is written in the following form: combine:<filter 1>+<filter 2>+<filter 3> Certain non-alphanumeric characters in each filter must be URL-encoded. For now, combined filters must be specified in this form. In a subsequent commit, rev-list will support multiple --filter arguments which will have the same effect as specifying one filter argument starting with "combine:". > Helped-by: Emily Shaffer <emilyshaffer@xxxxxxxxxx> > Helped-by: Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> > Helped-by: Junio C Hamano <gitster@xxxxxxxxx> > Signed-off-by: Matthew DeVore <matvore@xxxxxxxxxx> > --- > list-objects-filter-options.c | 106 ++++++++++++++++++- > list-objects-filter-options.h | 17 ++- > list-objects-filter.c | 159 ++++++++++++++++++++++++++++ > t/t6112-rev-list-filters-objects.sh | 151 +++++++++++++++++++++++++- > url.c | 6 ++ > url.h | 8 ++ > 6 files changed, 441 insertions(+), 6 deletions(-) > > @@ -28,22 +34,20 @@ static int gently_parse_list_objects_filter( > struct strbuf *errbuf) > { > const char *v0; > > if (filter_options->choice) { > strbuf_addstr( > errbuf, _("multiple filter-specs cannot be combined")); > return 1; > } > > - filter_options->filter_spec = strdup(arg); > - This line has been removed from gently_parse_list_objects_filter() because this function gains another caller that does not need it. To compensate, this line has been added to both its existing callers. > @@ -31,27 +32,37 @@ struct list_objects_filter_options { > * the filtering algorithm to use. > */ > enum list_objects_filter_choice choice; > > /* > * Choice is LOFC_DISABLED because "--no-filter" was requested. > */ > unsigned int no_filter : 1; > > /* > - * Parsed values (fields) from within the filter-spec. These are > - * choice-specific; not all values will be defined for any given > - * choice. > + * BEGIN choice-specific parsed values from within the filter-spec. Only > + * some values will be defined for any given choice. > */ > + > struct object_id *sparse_oid_value; > unsigned long blob_limit_value; > unsigned long tree_exclude_depth; > + > + /* LOFC_COMBINE values */ > + > + /* This array contains all the subfilters which this filter combines. */ > + size_t sub_nr, sub_alloc; > + struct list_objects_filter_options *sub; > + > + /* > + * END choice-specific parsed values. > + */ > }; I still think it's cleaner to just have a "left subfilter" and "right subfilter", but I don't feel strongly about it. In any case, this is an internal detail and can always be changed in the future. > + /* > + * Optional. If this function is supplied and the filter needs to > + * collect omits, then this function is called once before free_fn is > + * called. > + */ > + void (*finalize_omits_fn)(struct oidset *omits, void *filter_data); This is needed because a combined filter's omits actually lie in the subfilters. Resolving it this way means that callers must call list_objects_filter__free() before using the omits set. Can you add documentation to __init() (which is the first function to take in the omits set) and __free() describing this? (As stated in the test below, we cannot just share one omits set amongst all the subfilters - see filter_trees_update_omits and the call site that relies on its return value.) Here comes the tricky part... > +static int should_delegate(enum list_objects_filter_situation filter_situation, > + struct object *obj, > + struct subfilter *sub) > +{ > + if (!sub->is_skipping_tree) > + return 1; > + if (filter_situation == LOFS_END_TREE && > + oideq(&obj->oid, &sub->skip_tree)) { > + sub->is_skipping_tree = 0; > + return 1; > + } > + return 0; > +} Optional: I think this should be called "test_and_set_skip_tree" or something like that, made to return the inverse of its current return value, and documented: Returns the value of sub->is_skipping_tree at the moment of invocation. If iteration is at the LOFS_END_TREE of the tree currently being skipped, first clears sub->is_skipping_tree before returning. > +static enum list_objects_filter_result process_subfilter( > + struct repository *r, > + enum list_objects_filter_situation filter_situation, > + struct object *obj, > + const char *pathname, > + const char *filename, > + struct subfilter *sub) > +{ > + enum list_objects_filter_result result; > + > + /* > + * Check should_delegate before oidset_contains so that > + * is_skipping_tree gets unset even when the object is marked as seen. > + * As of this writing, no filter uses LOFR_MARK_SEEN on trees that also > + * uses LOFR_SKIP_TREE, so the ordering is only theoretically > + * important. Be cautious if you change the order of the below checks > + * and more filters have been added! > + */ > + if (!should_delegate(filter_situation, obj, sub)) > + return LOFR_ZERO; > + if (oidset_contains(&sub->seen, &obj->oid)) > + return LOFR_ZERO; > + > + result = list_objects_filter__filter_object( > + r, filter_situation, obj, pathname, filename, sub->filter); > + > + if (result & LOFR_MARK_SEEN) > + oidset_insert(&sub->seen, &obj->oid); > + > + if (result & LOFR_SKIP_TREE) { > + sub->is_skipping_tree = 1; > + sub->skip_tree = obj->oid; > + } > + > + return result; > +} Looks good. > +static enum list_objects_filter_result filter_combine( > + struct repository *r, > + enum list_objects_filter_situation filter_situation, > + struct object *obj, > + const char *pathname, > + const char *filename, > + struct oidset *omits, > + void *filter_data) > +{ > + struct combine_filter_data *d = filter_data; > + enum list_objects_filter_result combined_result = > + LOFR_DO_SHOW | LOFR_MARK_SEEN | LOFR_SKIP_TREE; > + size_t sub; > + > + for (sub = 0; sub < d->nr; sub++) { > + enum list_objects_filter_result sub_result = process_subfilter( > + r, filter_situation, obj, pathname, filename, > + &d->sub[sub]); > + if (!(sub_result & LOFR_DO_SHOW)) > + combined_result &= ~LOFR_DO_SHOW; > + if (!(sub_result & LOFR_MARK_SEEN)) > + combined_result &= ~LOFR_MARK_SEEN; > + if (!d->sub[sub].is_skipping_tree) > + combined_result &= ~LOFR_SKIP_TREE; > + } > + > + return combined_result; > +} And also looks good. Might be confusing for tree skipping to be communicated through is_skipping_tree instead of the return value, but is_skipping_tree needs to be set anyway for other reasons, so that's convenient.