Re: [PATCH 06/14] refs: stop re-verifying common prefixes for availability

Karthik Nayak <karthik.188@xxxxxxxxx> · Tue, 18 Feb 2025 08:12:05 -0800

Patrick Steinhardt <ps@xxxxxx> writes:

> One of the checks done by `refs_verify_refnames_available()` is whether
> any of the prefixes of a reference already exists. For example, given a
> reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
> already exist, and if so we'd abort the transaction.
>
> When updating multiple references at once, this check is performed for
> each of the references individually. Consequently, because references
> tend to have common prefixes like "refs/heads/" or refs/tags/", we
> evaluate the availability of these prefixes repeatedly. Naturally this
> is a waste of compute, as the availability of those prefixes should in
> general not change in the middle of a transaction. And if it would,
> backends would notice at a later point in time.
>
> Optimize this pattern by storing prefixes in a `strset` so that we can
> trivially track those prefixes that we have already checked. This leads
> to a significant speedup when creating many references that all share a
> common prefix:
>
>     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
>       Range (min … max):    60.6 ms …  69.5 ms    38 runs
>
>     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
>       Range (min … max):    38.1 ms …  47.3 ms    61 runs
>
>     Summary
>       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
>
> Note that the same speedup cannot be observed for the "files" backend
> because it still performs availability check per reference.
>

In the previous commit, you started using the new function in the
reftable backend, can we not make a similar change to the files backend?

> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  refs.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/refs.c b/refs.c
> index 5a9b0f2fa1e..eaf41421f50 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  {
>  	struct strbuf dirname = STRBUF_INIT;
>  	struct strbuf referent = STRBUF_INIT;
> +	struct strset dirnames;
>  	int ret = -1;
>
>  	/*
> @@ -2485,6 +2486,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
>
>  	assert(err);
>
> +	strset_init(&dirnames);
> +
>  	for (size_t i = 0; i < refnames->nr; i++) {
>  		const char *refname = refnames->items[i].string;
>  		const char *extra_refname;
> @@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  			if (skip && string_list_has_string(skip, dirname.buf))
>  				continue;
>
> +			/*
> +			 * If we've already seen the directory we don't need to
> +			 * process it again. Skip it to avoid checking checking
> +			 * common prefixes like "refs/heads/" repeatedly.
> +			 */
> +			if (!strset_add(&dirnames, dirname.buf))
> +				continue;
> +

This was simple and neat. Nice.

>  			if (!initial_transaction &&
>  			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
>  					       &type, &ignore_errno)) {
> @@ -2574,6 +2585,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  cleanup:
>  	strbuf_release(&referent);
>  	strbuf_release(&dirname);
> +	strset_clear(&dirnames);
>  	return ret;
>  }
>
>
> --
> 2.48.1.666.gff9fcf71b7.dirty
Attachment:
signature.asc

Description: PGP signature