On Fri, Jun 21, 2019 at 05:26:26PM -0700, Jonathan Tan wrote: > > Allow combining filters such that only objects accepted by all filters > > are shown. The motivation for this is to allow getting directory > > listings without also fetching blobs. This can be done by combining > > blob:none with tree:<depth>. There are massive repositories that have > > larger-than-expected trees - even if you include only a single commit. > > First of all, patches 2 and 3 are straightforward and LGTM. On to patch > 4... > > [snip] > > > The current usage requires passing the filter to rev-list in the > > following form: > > > > --filter=<FILTER1> --filter=<FILTER2> ... > > > > Such usage is currently an error, so giving it a meaning is backwards- > > compatible. > > > > The URL-encoding scheme is being introduced before the repeated flag > > logic, and the user-facing documentation for URL-encoding is being > > withheld until the repeated flag feature is implemented. The > > URL-encoding is in general not meant to be used directly by the user, > > and it is better to describe the URL-encoding feature in terms of the > > repeated flag. > > As of this commit, we don't support such arguments passed to rev-list in > this way, so I would write these paragraphs as: > > A combined filter supports any number of subfilters, and is written in > the following form: > > combine:<filter 1>+<filter 2>+<filter 3> > > Certain non-alphanumeric characters in each filter must be > URL-encoded. > > For now, combined filters must be specified in this form. In a > subsequent commit, rev-list will support multiple --filter arguments > which will have the same effect as specifying one filter argument > starting with "combine:". Done, but I've amended your last paragraph to include the excuse about the missing documentation: For now, combined filters must be specified in this form. In a subsequent commit, rev-list will support multiple --filter arguments which will have the same effect as specifying one filter argument starting with "combine:". The documentation will be updated in that commit, as the URL-encoding scheme is in general not meant to be used directly by the user, and it is better to describe the URL-encoding feature in terms of the repeated flag. > > + > > + /* LOFC_COMBINE values */ > > + > > + /* This array contains all the subfilters which this filter combines. */ > > + size_t sub_nr, sub_alloc; > > + struct list_objects_filter_options *sub; > > + > > + /* > > + * END choice-specific parsed values. > > + */ > > }; > > I still think it's cleaner to just have a "left subfilter" and "right > subfilter", but I don't feel strongly about it. In any case, this is an > internal detail and can always be changed in the future. > Interesting. I think there are two reasonable ways of thinking about it: a. we are parsing a nested left-associative binary expression, and it's helpful if the internal representation matches that semantic. (Maybe this is what you're thinking?) b. linked-lists are somewhat aberrational and should only be used in a very narrow class of code. Arrays are more universal and therefore more readable. I'll keep it as-is for now since I don't see a great reason to revert it to the previous style. Thank you for pointing this out - I *did* suspect there was a big subjective factor to people's preference for the array form. > > + /* > > + * Optional. If this function is supplied and the filter needs to > > + * collect omits, then this function is called once before free_fn is > > + * called. > > + */ > > + void (*finalize_omits_fn)(struct oidset *omits, void *filter_data); > > This is needed because a combined filter's omits actually lie in the > subfilters. Resolving it this way means that callers must call > list_objects_filter__free() before using the omits set. Can you add > documentation to __init() (which is the first function to take in the > omits set) and __free() describing this? > > (As stated in the test below, we cannot just share one omits set amongst > all the subfilters - see filter_trees_update_omits and the call site > that relies on its return value.) > I documented __init as follows: /* * Constructor for the set of defined list-objects filters. * The `omitted` set is optional. It is populated with objects that the * filter excludes. This set should not be considered finalized until * after list_objects_filter__free is called on the returned `struct * filter *`. */ And __free: /* * Destroys `filter` and finalizes the `omitted` set, if present. Does * nothing if `filter` is null. */ And finalize_omits_fn (which has the internal specifics): /* * Optional. If this function is supplied and the filter needs * to collect omits, then this function is called once before * free_fn is called. * * This is required because the following two conditions hold: * * a. A tree filter can add and remove objects as an object * graph is traversed. * b. A combine filter's omit set is the union of all its * subfilters, which may include tree: filters. * * As such, the omits sets must be separate sets, and can only * be unioned after the traversal is completed. */ > Here comes the tricky part... > > > +static int should_delegate(enum list_objects_filter_situation filter_situation, > > + struct object *obj, > > + struct subfilter *sub) > > +{ > > + if (!sub->is_skipping_tree) > > + return 1; > > + if (filter_situation == LOFS_END_TREE && > > + oideq(&obj->oid, &sub->skip_tree)) { > > + sub->is_skipping_tree = 0; > > + return 1; > > + } > > + return 0; > > +} > > Optional: I think this should be called "test_and_set_skip_tree" or > something like that, made to return the inverse of its current return > value, and documented: > > Returns the value of sub->is_skipping_tree at the moment of > invocation. If iteration is at the LOFS_END_TREE of the tree currently > being skipped, first clears sub->is_skipping_tree before returning. > That is not totally accurate, since in the second if block (LOFS_END_TREE) we would return 0 even though is_skipping_tree starts out as 1. Since it's not equal to is_skipping_tree before invocation, I don't think there is a better name to give it ATM. So in the interest of removing "trickiness" I've inlined the function (since there is no real way to express what it's doing in a function name anyway) and I think it's more readable. This is the new callsite: /* * Check and update is_skipping_tree before oidset_contains so * that is_skipping_tree gets unset even when the object is * marked as seen. As of this writing, no filter uses * LOFR_MARK_SEEN on trees that also uses LOFR_SKIP_TREE, so the * ordering is only theoretically important. Be cautious if you * change the order of the below checks and more filters have * been added! */ if (sub->is_skipping_tree) { if (filter_situation == LOFS_END_TREE && oideq(&obj->oid, &sub->skip_tree)) sub->is_skipping_tree = 0; else return LOFR_ZERO; } Hopefully that's agreeable for everyone.