Patrick Steinhardt <ps@xxxxxx> writes: > One of the checks done by `refs_verify_refnames_available()` is whether > any of the prefixes of a reference already exists. For example, given a > reference "refs/heads/main", we'd check whether "refs/heads" or "refs" > already exist, and if so we'd abort the transaction. > > When updating multiple references at once, this check is performed for > each of the references individually. Consequently, because references > tend to have common prefixes like "refs/heads/" or refs/tags/", we > evaluate the availability of these prefixes repeatedly. Naturally this > is a waste of compute, as the availability of those prefixes should in > general not change in the middle of a transaction. And if it would, > backends would notice at a later point in time. > > Optimize this pattern by storing prefixes in a `strset` so that we can > trivially track those prefixes that we have already checked. This leads > to a significant speedup when creating many references that all share a > common prefix: > > Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) > Time (mean ± σ): 63.1 ms ± 1.8 ms [User: 41.0 ms, System: 21.6 ms] > Range (min … max): 60.6 ms … 69.5 ms 38 runs > > Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) > Time (mean ± σ): 40.0 ms ± 1.3 ms [User: 29.3 ms, System: 10.3 ms] > Range (min … max): 38.1 ms … 47.3 ms 61 runs > > Summary > update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran > 1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) > > Note that the same speedup cannot be observed for the "files" backend > because it still performs availability check per reference. > In the previous commit, you started using the new function in the reftable backend, can we not make a similar change to the files backend? > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> > --- > refs.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/refs.c b/refs.c > index 5a9b0f2fa1e..eaf41421f50 100644 > --- a/refs.c > +++ b/refs.c > @@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs, > { > struct strbuf dirname = STRBUF_INIT; > struct strbuf referent = STRBUF_INIT; > + struct strset dirnames; > int ret = -1; > > /* > @@ -2485,6 +2486,8 @@ int refs_verify_refnames_available(struct ref_store *refs, > > assert(err); > > + strset_init(&dirnames); > + > for (size_t i = 0; i < refnames->nr; i++) { > const char *refname = refnames->items[i].string; > const char *extra_refname; > @@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs, > if (skip && string_list_has_string(skip, dirname.buf)) > continue; > > + /* > + * If we've already seen the directory we don't need to > + * process it again. Skip it to avoid checking checking > + * common prefixes like "refs/heads/" repeatedly. > + */ > + if (!strset_add(&dirnames, dirname.buf)) > + continue; > + This was simple and neat. Nice. > if (!initial_transaction && > !refs_read_raw_ref(refs, dirname.buf, &oid, &referent, > &type, &ignore_errno)) { > @@ -2574,6 +2585,7 @@ int refs_verify_refnames_available(struct ref_store *refs, > cleanup: > strbuf_release(&referent); > strbuf_release(&dirname); > + strset_clear(&dirnames); > return ret; > } > > > -- > 2.48.1.666.gff9fcf71b7.dirty
Attachment:
signature.asc
Description: PGP signature