Re: [PATCH] clone: ignore invalid local refs in remote

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 18 Apr 2022 19:30:35 +0200

On Mon, Apr 18 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@xxxxxxxxxx>
>
> When cloning directly from a local repository, we load a list of refs
> based on scanning the $GIT_DIR/refs/ directory of the "server"
> repository. If files exist in that directory that do not parse as
> hexadecimal hashes, then the ref array used by write_remote_refs()
> ends up with some entries with null OIDs. This causes us to hit a BUG()
> statement in ref_transaction_create():
>
>   BUG: create called without valid new_oid
>
> This BUG() call used to be a die() until 033abf97f (Replace all
> die("BUG: ...") calls by BUG() ones, 2018-05-02). Before that, the die()
> was added by f04c5b552 (ref_transaction_create(): check that new_sha1 is
> valid, 2015-02-17).
>
> The original report for this bug [1] mentioned that this problem did not
> exist in Git 2.27.0. The failure bisects unsurprisingly to 968f12fda
> (refs: turn on GIT_REF_PARANOIA by default, 2021-09-24). When
> GIT_REF_PARANOIA is enabled, this case always fails as far back as I am
> able to successfully compile and test the Git codebase.
>
> [1] https://github.com/git-for-windows/git/issues/3781
>
> There are two approaches to consider here. One would be to remove this
> BUG() statement in favor of returning with an error. There are only two
> callers to ref_transaction_create(), so this would have a limited
> impact.
>
> The other approach would be to add special casing in 'git clone' to
> avoid this faulty input to the method.
>
> While I originally started with changing 'git clone', I decided that
> modifying ref_transaction_create() was a more complete solution. This
> prevents failing with a BUG() statement when we already have a good way
> to report an error (including a reason for that error) within the
> method. Both callers properly check the return value and die() with the
> error message, so this is an appropriate direction.
>
> The added test helps check against a regression, but does check that our
> intended error message is handled correctly.
>
> Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx>
> ---
>     clone: ignore invalid local refs in remote
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1214%2Fderrickstolee%2Frefs-bug-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1214/derrickstolee/refs-bug-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1214
>
>  refs.c                 | 6 ++++--
>  t/t5605-clone-local.sh | 9 +++++++++
>  2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index 1a964505f92..f300f83e4d4 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1111,8 +1111,10 @@ int ref_transaction_create(struct ref_transaction *transaction,
>  			   unsigned int flags, const char *msg,
>  			   struct strbuf *err)
>  {
> -	if (!new_oid || is_null_oid(new_oid))
> -		BUG("create called without valid new_oid");
> +	if (!new_oid || is_null_oid(new_oid)) {
> +		strbuf_addf(err, "'%s' has a null OID", refname);
> +		return 1;
> +	}
>  	return ref_transaction_update(transaction, refname, new_oid,
>  				      null_oid(), flags, msg, err);
>  }
> diff --git a/t/t5605-clone-local.sh b/t/t5605-clone-local.sh
> index 7d63365f93a..21ab6192839 100755
> --- a/t/t5605-clone-local.sh
> +++ b/t/t5605-clone-local.sh
> @@ -141,4 +141,13 @@ test_expect_success 'cloning locally respects "-u" for fetching refs' '
>  	test_must_fail git clone --bare -u false a should_not_work.git
>  '
>  
> +test_expect_success 'local clone from repo with corrupt refs fails gracefully' '
> +	git init corrupt &&
> +	test_commit -C corrupt one &&
> +	echo a >corrupt/.git/refs/heads/topic &&
> +
> +	test_must_fail git clone corrupt working 2>err &&
> +	grep "has a null OID" err
> +'
> +
>  test_done
>
> base-commit: 11cfe552610386954886543f5de87dcc49ad5735

If I change this test to clone from "$PWD/corrupt" instead we get as far
as dying in "not our ref" in upload-pack", after the "client" said "want
0000....".

I don't see anything wrong with this narrow fix, but I wonder how close
we are to just doing something useful here for the user. I.e. if we
can't parse_object() what we're about to "copy" to e.g. error() on that,
but carry on cloning the rest if possible.

We should probably always return a non-zero at the end, suggesting the
user omit the bad ref with a refspec. But if we did all that we could
usefully recover partially corrupt-repos in this state...