Re: [PATCH v2 6/6] refs/reftable: wire up support for exclude patterns

Taylor Blau <me@xxxxxxxxxxxx> · Tue, 17 Sep 2024 05:26:48 -0400

On Mon, Sep 16, 2024 at 10:50:16AM +0200, Patrick Steinhardt wrote:
> This makes t1419 work with the "reftable" backend with some slight
> modifications. Of course it also speeds up listing of references with
> hidden refs. The following benchmark prints one reference with 1 million
> hidden references:
>
>     Benchmark 1: HEAD~
>       Time (mean ± σ):      93.3 ms ±   2.1 ms    [User: 90.3 ms, System: 2.5 ms]
>       Range (min … max):    89.8 ms …  97.2 ms    33 runs
>
>     Benchmark 2: HEAD
>       Time (mean ± σ):       4.2 ms ±   0.6 ms    [User: 2.2 ms, System: 1.8 ms]
>       Range (min … max):     3.1 ms …   8.1 ms    765 runs
>
>     Summary
>       HEAD ran
>        22.15 ± 3.19 times faster than HEAD~

Very nice.

> +		/*
> +		 * When the reference name is lexicographically bigger than the
> +		 * current exclude pattern we know that it won't ever match any
> +		 * of the following references, either. We thus advance to the
> +		 * next pattern and re-check whether it matches.
> +		 *
> +		 * Otherwise, if it's smaller, then we do not have a match and
> +		 * thus want to show the current reference.
> +		 */
> +		cmp = strncmp(iter->ref.refname, pattern,
> +			      iter->exclude_patterns_strlen);
> +		if (cmp > 0) {
> +			iter->exclude_patterns_index++;
> +			iter->exclude_patterns_strlen = 0;
> +			continue;
> +		}
> +		if (cmp < 0)
> +			return 0;

Perhaps I am showing my ignorance for the reftable backend, but is it OK
to call strncmp() against all patterns here?

In the packed-refs implementation which I worked on with Peff sometime
last year, we rejected exclude patterns that contain special glob
characters in them for exactly this reason.

The implementation that I'm referring to has a helpful comment that
jogged my memory for what we were thinking at the time (see the
function refs/packed-backend.c::populate_excluded_jump_list()).

Ugh, I just read the next hunk below, so ignore me here ;-).

> +static char **filter_exclude_patterns(const char **exclude_patterns)
> +{
> +	size_t filtered_size = 0, filtered_alloc = 0;
> +	char **filtered = NULL;
> +
> +	if (!exclude_patterns)
> +		return NULL;
> +
> +	for (size_t i = 0; ; i++) {
> +		const char *exclude_pattern = exclude_patterns[i];
> +		int has_glob = 0;
> +
> +		if (!exclude_pattern)
> +			break;
> +
> +		for (const char *p = exclude_pattern; *p; p++) {
> +			has_glob = is_glob_special(*p);
> +			if (has_glob)
> +				break;
> +		}
> +		if (has_glob)
> +			continue;
> +
> +		ALLOC_GROW(filtered, filtered_size + 1, filtered_alloc);
> +		filtered[filtered_size++] = xstrdup(exclude_pattern);
> +	}
> +
> +	if (filtered_size) {
> +		QSORT(filtered, filtered_size, qsort_strcmp);
> +		ALLOC_GROW(filtered, filtered_size + 1, filtered_alloc);
> +		filtered[filtered_size++] = NULL;
> +	}
> +
> +	return filtered;
> +}

Ohhh, here's where we filter out un-excludeable patterns. Ignore me :-).

One question I had reading this is why we don't filter these out on the
fly in the iterator itself instead of allocating a separate array that
we have to xstrdup() into and free later on.

We may be at the point of diminishing returns here, but I wonder if
allocating this thing is more expensive than a few redundant strcmp()s
and calls to is_glob_special(). I dunno.

Thanks,
Taylor