Re: [PATCH 6/6] refs/reftable: wire up support for exclude patterns

Patrick Steinhardt <ps@xxxxxx> · Mon, 16 Sep 2024 08:56:16 +0200

On Fri, Sep 13, 2024 at 07:47:06AM -0500, karthik nayak wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> > +/*
> > + * Handle exclude patterns. Returns either `1`, which tells the caller that the
> > + * current reference shall not be shown. Or `0`, which indicates that it should
> > + * be shown.
> > + */
> > +static int should_exclude_current_ref(struct reftable_ref_iterator *iter)
> > +{
> > +	while (iter->exclude_patterns[iter->exclude_patterns_index]) {
> > +		const char *pattern = iter->exclude_patterns[iter->exclude_patterns_index];
> > +		char *ref_after_pattern;
> > +		int cmp;
> > +
> > +		/*
> > +		 * Lazily cache the pattern length so that we don't have to
> > +		 * recompute it every time this function is called.
> > +		 */
> > +		if (!iter->exclude_patterns_strlen)
> > +			iter->exclude_patterns_strlen = strlen(pattern);
> > +
> > +		/*
> > +		 * When the reference name is lexicographically bigger than the
> > +		 * current exclude pattern we know that it won't ever match any
> > +		 * of the following references, either. We thus advance to the
> > +		 * next pattern and re-check whether it matches.
> 
> So this means that the exclude patterns were lexicographically sorted.
> Otherwise this would work.

Indeed. Good that you call out my assumption, as I in fact didn't verify
that it holds, and in fact it doesn't. It's not a correctness issue if
it doesn't hold, because it would simply mean that we don't skip over
some references where we really could. But it certainly is a perfomance
issue.

Will fix and add a test for it.

> > +		 * Note that the seeked-to reference may also be excluded. This
> > +		 * is not handled here though, but the caller is expected to
> > +		 * loop and re-verify the next reference for us.
> > +		 */
> 
> The seeked-to reference here being the one with 0xff. We could get rid
> of this by doing something like this:
> 
>     int last_char_idx = iter->exclude_patterns_strlen - 1
>     ref_after_pattern = xstrfmt("%s", pattern);
>     ref_after_pattern[last_char_idx] = ref_after_pattern[last_char_idx] + 1;
> 
> instead no?

Sorry, I don't quite follow what you mean with "get rid of this". What
exactly is "this"? Do you mean the re-looping?

If so then the above doesn't fix it, no. We'd have to repeat a whole lot
of code here to also retrieve the next entry, store it into `iter->ref`,
check whether it is an actual ref starting with "refs/" and so on.
Looping once very much feels like the better thing to do.

> > @@ -580,9 +660,45 @@ static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
> >  	.abort = reftable_ref_iterator_abort
> >  };
> >
> > +static char **filter_exclude_patterns(const char **exclude_patterns)
> > +{
> > +	size_t filtered_size = 0, filtered_alloc = 0;
> > +	char **filtered = NULL;
> > +
> > +	if (!exclude_patterns)
> > +		return NULL;
> > +
> > +	for (size_t i = 0; ; i++) {
> > +		const char *exclude_pattern = exclude_patterns[i];
> > +		int has_glob = 0;
> > +
> > +		if (!exclude_pattern)
> > +			break;
> > +
> > +		for (const char *p = exclude_pattern; *p; p++) {
> > +			has_glob = is_glob_special(*p);
> > +			if (has_glob)
> > +				break;
> > +		}
> 
> Why do we need to filter excludes here? Don't the callee's already do
> something like this?

No, it doesn't. The code for exclude patterns is structured in such a
way that the responsibility is with the backend to decide what it can
and cannot filter. In theory there could be a backend that can exclude
refs based on globs efficiently, even though neither the "files" nor the
"reftable" backend can.

Patrick