Re: [PATCH 2/2] builtin/clone: allow remote helpers to detect repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey, it's me again, coming back way too late to figure out that this fix
had an adverse side effect in a corner case.

Here's a simplified reproducer:
```
$ cat > git-remote-foo <<EOF
#!/bin/sh
# capabilities
echo option
echo import
echo push
echo
# option progress true
echo ok
# option verbosity 1
echo ok
# list
echo
EOF
$ chmod +x git-remote-foo
$ PATH=$PWD:$PATH git clone foo::bar
Cloning into 'bar'...
created HEAD
warning: this remote helper should implement refspec capability
created reference db
warning: You appear to have cloned an empty repository.
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: 	git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: 	git branch -m <name>
```

That hint is the adverse side effect. It shouldn't be shown.

It's happening because `builtin/clone.c` does this:
	`create_reference_database(the_repository->ref_storage_format, NULL, 1);`

`create_reference` calls `is_reinit`, which returns true because there is
HEAD, and proceeds to skip the block behind `if (!reinit)`.

Before the change, that code block that is now skipped would have called
`git_default_branch_name(quiet)` (where `quiet` is 1, per the above call
to `create_reference_database`). In turn
`git_default_branch_name(quiet)` would have a) skipped the initial
branch hint (because quiet), and b) cached the branch name for subsequent
`git_default_branch_name` calls.

Then later in `builtin/clone.c`, we end up in this code:
	`branch = git_default_branch_name(0);`

Before the change, this would get the cache value from the previous
call, but now it has to compute it, and because 0 is passed, it's not
quiet about the initial branch hint.

The non-remote-helper case goes through with this without a problem
because it ends up in the transport_ls_refs_options.unborn_head_target
case and never calls `git_default_branch_name(0)`.

Mike

On Tue, Feb 27, 2024 at 03:27:44PM +0100, Patrick Steinhardt wrote:
> In 18c9cb7524 (builtin/clone: create the refdb with the correct object
> format, 2023-12-12), we have changed git-clone(1) so that it delays
> creation of the refdb until after it has learned about the remote's
> object format. This change was required for the reftable backend, which
> encodes the object format into the tables. So if we pre-initialized the
> refdb with the default object format, but the remote uses a different
> object format than that, then the resulting tables would have encoded
> the wrong object format.
> 
> This change unfortunately breaks remote helpers which try to access the
> repository that is about to be created. Because the refdb has not yet
> been initialized at the point where we spawn the remote helper, we also
> don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by
> the remote helper which try to access the repository would fail because
> it cannot be discovered.
> 
> This is essentially a chicken-and-egg problem: we cannot initialize the
> refdb because we don't know about the object format. But we cannot learn
> about the object format because the remote helper may be unable to
> access the partially-initialized repository.
> 
> Ideally, we would address this issue via capabilities. But the remote
> helper protocol is not structured in a way that guarantees that the
> capability announcement happens before the remote helper tries to access
> the repository.
> 
> Instead, fix this issue by partially initializing the refdb up to the
> point where it becomes discoverable by Git commands.
> 
> Reported-by: Mike Hommey <mh@xxxxxxxxxxxx>
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  builtin/clone.c            | 46 ++++++++++++++++++++++++++++++++++++++
>  setup.c                    |  9 +++++++-
>  t/t5801/git-remote-testgit |  5 +++++
>  3 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index bad1b70ce8..5d7f112125 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -926,6 +926,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	struct ref *mapped_refs = NULL;
>  	const struct ref *ref;
>  	struct strbuf key = STRBUF_INIT;
> +	struct strbuf buf = STRBUF_INIT;
>  	struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
>  	struct transport *transport = NULL;
>  	const char *src_ref_prefix = "refs/heads/";
> @@ -1125,6 +1126,50 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  		git_dir = real_git_dir;
>  	}
>  
> +	/*
> +	 * We have a chicken-and-egg situation between initializing the refdb
> +	 * and spawning transport helpers:
> +	 *
> +	 *   - Initializing the refdb requires us to know about the object
> +	 *     format. We thus have to spawn the transport helper to learn
> +	 *     about it.
> +	 *
> +	 *   - The transport helper may want to access the Git repository. But
> +	 *     because the refdb has not been initialized, we don't have "HEAD"
> +	 *     or "refs/". Thus, the helper cannot find the Git repository.
> +	 *
> +	 * Ideally, we would have structured the helper protocol such that it's
> +	 * mandatory for the helper to first announce its capabilities without
> +	 * yet assuming a fully initialized repository. Like that, we could
> +	 * have added a "lazy-refdb-init" capability that announces whether the
> +	 * helper is ready to handle not-yet-initialized refdbs. If any helper
> +	 * didn't support them, we would have fully initialized the refdb with
> +	 * the SHA1 object format, but later on bailed out if we found out that
> +	 * the remote repository used a different object format.
> +	 *
> +	 * But we didn't, and thus we use the following workaround to partially
> +	 * initialize the repository's refdb such that it can be discovered by
> +	 * Git commands. To do so, we:
> +	 *
> +	 *   - Create an invalid HEAD ref pointing at "refs/heads/.invalid".
> +	 *
> +	 *   - Create the "refs/" directory.
> +	 *
> +	 *   - Set up the ref storage format and repository version as
> +	 *     required.
> +	 *
> +	 * This is sufficient for Git commands to discover the Git directory.
> +	 */
> +	initialize_repository_version(GIT_HASH_UNKNOWN,
> +				      the_repository->ref_storage_format, 1);
> +
> +	strbuf_addf(&buf, "%s/HEAD", git_dir);
> +	write_file(buf.buf, "ref: refs/heads/.invalid");
> +
> +	strbuf_reset(&buf);
> +	strbuf_addf(&buf, "%s/refs", git_dir);
> +	safe_create_dir(buf.buf, 1);
> +
>  	/*
>  	 * additional config can be injected with -c, make sure it's included
>  	 * after init_db, which clears the entire config environment.
> @@ -1453,6 +1498,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	free(remote_name);
>  	strbuf_release(&reflog_msg);
>  	strbuf_release(&branch_top);
> +	strbuf_release(&buf);
>  	strbuf_release(&key);
>  	free_refs(mapped_refs);
>  	free_refs(remote_head_points_at);
> diff --git a/setup.c b/setup.c
> index b69b1cbc2a..e3b76e84b5 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1889,6 +1889,13 @@ void initialize_repository_version(int hash_algo,
>  	char repo_version_string[10];
>  	int repo_version = GIT_REPO_VERSION;
>  
> +	/*
> +	 * Note that we initialize the repository version to 1 when the ref
> +	 * storage format is unknown. This is on purpose so that we can add the
> +	 * correct object format to the config during git-clone(1). The format
> +	 * version will get adjusted by git-clone(1) once it has learned about
> +	 * the remote repository's format.
> +	 */
>  	if (hash_algo != GIT_HASH_SHA1 ||
>  	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
>  		repo_version = GIT_REPO_VERSION_READ;
> @@ -1898,7 +1905,7 @@ void initialize_repository_version(int hash_algo,
>  		  "%d", repo_version);
>  	git_config_set("core.repositoryformatversion", repo_version_string);
>  
> -	if (hash_algo != GIT_HASH_SHA1)
> +	if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
>  		git_config_set("extensions.objectformat",
>  			       hash_algos[hash_algo].name);
>  	else if (reinit)
> diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
> index 1544d6dc6b..bcfb358c51 100755
> --- a/t/t5801/git-remote-testgit
> +++ b/t/t5801/git-remote-testgit
> @@ -12,6 +12,11 @@ url=$2
>  
>  dir="$GIT_DIR/testgit/$alias"
>  
> +if ! git rev-parse --is-inside-git-dir
> +then
> +	exit 1
> +fi
> +
>  h_refspec="refs/heads/*:refs/testgit/$alias/heads/*"
>  t_refspec="refs/tags/*:refs/testgit/$alias/tags/*"
>  
> -- 
> 2.44.0
> 






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux