Re: [PATCH v2 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs

Justin Tobler <jltobler@xxxxxxxxx> · Wed, 19 Feb 2025 12:21:44 -0600

On 25/02/19 02:23PM, Patrick Steinhardt wrote:
> Most of the commands in git-update-ref(1) accept an old and/or new
> object ID to update a specific reference to. These object IDs get parsed
> via `repo_get_oid()`, which not only handles plain object IDs, but also
> those that have a suffix like "~" or "^2". More surprisingly though, it
> even knows to resolve references, despite the fact that its manpage does
> not mention this fact even once.
> 
> One consequence of this is that we also check for ambiguous references:
> when parsing a full object ID where the DWIM mechanism would also cause
> us to resolve it as a branch, we'd end up printing a warning. While this
> check makes sense to have in general, it is arguably less useful in the
> context of git-update-ref(1). This is out of two reasons:
> 
>   - The manpage is explicitly structured around object IDs. So if we see
>     a fully blown object ID, the intent should be quite clear in
>     general.

Makes sense.

>   - The command is part of our plumbing layer and not a tool that users
>     would generally use in interactive workflows. As such, the warning
>     will likely not be visible to anybody in the first place.

Ok, so in many cases already the warning is not propagated which makes
its computation wasteful to begin with.

> Furthermore, this check can be quite expensive when updating lots of
> references via `--stdin`, because we try to read multiple references per
> object ID that we parse according to the DWIM rules. This effect can be
> seen both with the "files" and "reftable" backend.
> 
> The issue is not unique to git-update-ref(1), but was also an issue in
> git-cat-file(1), where it was addressed by disabling the ambiguity check
> in 25fba78d36b (cat-file: disable object/refname ambiguity check for
> batch mode, 2013-07-12).
> 
> Disable the warning in git-update-ref(1), which provides a significant
> speedup with both backends. The following benchmark creates 10000 new
> references with a 100000 preexisting refs with the "files" backend:
> 
>     Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):     467.3 ms ±   5.1 ms    [User: 100.0 ms, System: 365.1 ms]
>       Range (min … max):   461.9 ms … 479.3 ms    10 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):     394.1 ms ±   5.8 ms    [User: 63.3 ms, System: 327.6 ms]
>       Range (min … max):   384.9 ms … 405.7 ms    10 runs
> 
>     Summary
>       update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
> 
> And with the "reftable" backend:
> 
>     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):     146.9 ms ±   2.2 ms    [User: 90.4 ms, System: 56.0 ms]
>       Range (min … max):   142.7 ms … 150.8 ms    19 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):      63.2 ms ±   1.1 ms    [User: 41.0 ms, System: 21.8 ms]
>       Range (min … max):    61.1 ms …  66.6 ms    41 runs
> 
>     Summary
>       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
>         2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
> 
> Note that the absolute improvement with both backends is roughly in the
> same ballpark, but the relative improvement for the "reftable" backend
> is more significant because writing the new table to disk is faster in
> the first place.
> 
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  builtin/update-ref.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/update-ref.c b/builtin/update-ref.c
> index 4d35bdc4b4b..d603f54b770 100644
> --- a/builtin/update-ref.c
> +++ b/builtin/update-ref.c
> @@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
>  		(*next)++;
>  		*next = parse_arg(*next, &arg);
>  		if (arg.len) {
> -			if (repo_get_oid(the_repository, arg.buf, oid))
> +			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
> +						    GET_OID_SKIP_AMBIGUITY_CHECK))
>  				goto invalid;
>  		} else {
>  			/* Without -z, an empty value means all zeros: */
> @@ -197,7 +198,8 @@ static int parse_next_oid(const char **next, const char *end,
>  		*next += arg.len;
>  
>  		if (arg.len) {
> -			if (repo_get_oid(the_repository, arg.buf, oid))
> +			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
> +						    GET_OID_SKIP_AMBIGUITY_CHECK))
>  				goto invalid;
>  		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
>  			/* With -z, treat an empty value as all zeros: */
> @@ -772,7 +774,8 @@ int cmd_update_ref(int argc,
>  		refname = argv[0];
>  		value = argv[1];
>  		oldval = argv[2];
> -		if (repo_get_oid(the_repository, value, &oid))
> +		if (repo_get_oid_with_flags(the_repository, value, &oid,
> +					    GET_OID_SKIP_AMBIGUITY_CHECK))
>  			die("%s: not a valid SHA1", value);
>  	}
>  
> @@ -783,7 +786,8 @@ int cmd_update_ref(int argc,
>  			 * must not already exist:
>  			 */
>  			oidclr(&oldoid, the_repository->hash_algo);
> -		else if (repo_get_oid(the_repository, oldval, &oldoid))
> +		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
> +						 GET_OID_SKIP_AMBIGUITY_CHECK))
>  			die("%s: not a valid old SHA1", oldval);
>  	}

In builtin/update-ref.c all uses of repo_get_oid() have been converted
to repo_get_oid_with_flags() with the GET_OID_SKIP_AMBIGUITY_CHECK flag
except for one in parse_cmd_symref_update(). Is there reason to leave
that one untouched?

-Justin