Re: [PATCH v2 6/6] refs/reftable: wire up support for exclude patterns

Patrick Steinhardt <ps@xxxxxx> · Tue, 17 Sep 2024 11:39:04 +0200

On Tue, Sep 17, 2024 at 05:26:48AM -0400, Taylor Blau wrote:
> On Mon, Sep 16, 2024 at 10:50:16AM +0200, Patrick Steinhardt wrote:
> > +		/*
> > +		 * When the reference name is lexicographically bigger than the
> > +		 * current exclude pattern we know that it won't ever match any
> > +		 * of the following references, either. We thus advance to the
> > +		 * next pattern and re-check whether it matches.
> > +		 *
> > +		 * Otherwise, if it's smaller, then we do not have a match and
> > +		 * thus want to show the current reference.
> > +		 */
> > +		cmp = strncmp(iter->ref.refname, pattern,
> > +			      iter->exclude_patterns_strlen);
> > +		if (cmp > 0) {
> > +			iter->exclude_patterns_index++;
> > +			iter->exclude_patterns_strlen = 0;
> > +			continue;
> > +		}
> > +		if (cmp < 0)
> > +			return 0;
> 
> Perhaps I am showing my ignorance for the reftable backend, but is it OK
> to call strncmp() against all patterns here?
> 
> In the packed-refs implementation which I worked on with Peff sometime
> last year, we rejected exclude patterns that contain special glob
> characters in them for exactly this reason.
> 
> The implementation that I'm referring to has a helpful comment that
> jogged my memory for what we were thinking at the time (see the
> function refs/packed-backend.c::populate_excluded_jump_list()).
> 
> Ugh, I just read the next hunk below, so ignore me here ;-).

;)

I was also wondering whether we'd want to amend the generic parts of the
refs interface to filter out globs. But it is entirely feasible that a
backend can indeed filter out globs efficiently, even though none of the
current ones can. So it kind of makes sense to keep things as-is and let
the backends themselves decide what they can use.

> > +static char **filter_exclude_patterns(const char **exclude_patterns)
> > +{
> > +	size_t filtered_size = 0, filtered_alloc = 0;
> > +	char **filtered = NULL;
> > +
> > +	if (!exclude_patterns)
> > +		return NULL;
> > +
> > +	for (size_t i = 0; ; i++) {
> > +		const char *exclude_pattern = exclude_patterns[i];
> > +		int has_glob = 0;
> > +
> > +		if (!exclude_pattern)
> > +			break;
> > +
> > +		for (const char *p = exclude_pattern; *p; p++) {
> > +			has_glob = is_glob_special(*p);
> > +			if (has_glob)
> > +				break;
> > +		}
> > +		if (has_glob)
> > +			continue;
> > +
> > +		ALLOC_GROW(filtered, filtered_size + 1, filtered_alloc);
> > +		filtered[filtered_size++] = xstrdup(exclude_pattern);
> > +	}
> > +
> > +	if (filtered_size) {
> > +		QSORT(filtered, filtered_size, qsort_strcmp);
> > +		ALLOC_GROW(filtered, filtered_size + 1, filtered_alloc);
> > +		filtered[filtered_size++] = NULL;
> > +	}
> > +
> > +	return filtered;
> > +}
> 
> Ohhh, here's where we filter out un-excludeable patterns. Ignore me :-).
> 
> One question I had reading this is why we don't filter these out on the
> fly in the iterator itself instead of allocating a separate array that
> we have to xstrdup() into and free later on.
> 
> We may be at the point of diminishing returns here, but I wonder if
> allocating this thing is more expensive than a few redundant strcmp()s
> and calls to is_glob_special(). I dunno.

I think duplicating the array is the right thing to do anyway to not get
weird lifetime issues with the exclude patterns. A caller may set up a
ref iterator that they may end up using for longer than they keep alive
the exlude patterns passed to the iterator. So by duplicating the array
we might end up wasting a bit of memory, but we avoid such lifetime
problems completely, which I think is a win to make the infrastructure
less fragile.

Thanks for your review!

Patrick