[PATCH 2/2] builtin/clone: allow remote helpers to detect repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In 18c9cb7524 (builtin/clone: create the refdb with the correct object
format, 2023-12-12), we have changed git-clone(1) so that it delays
creation of the refdb until after it has learned about the remote's
object format. This change was required for the reftable backend, which
encodes the object format into the tables. So if we pre-initialized the
refdb with the default object format, but the remote uses a different
object format than that, then the resulting tables would have encoded
the wrong object format.

This change unfortunately breaks remote helpers which try to access the
repository that is about to be created. Because the refdb has not yet
been initialized at the point where we spawn the remote helper, we also
don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by
the remote helper which try to access the repository would fail because
it cannot be discovered.

This is essentially a chicken-and-egg problem: we cannot initialize the
refdb because we don't know about the object format. But we cannot learn
about the object format because the remote helper may be unable to
access the partially-initialized repository.

Ideally, we would address this issue via capabilities. But the remote
helper protocol is not structured in a way that guarantees that the
capability announcement happens before the remote helper tries to access
the repository.

Instead, fix this issue by partially initializing the refdb up to the
point where it becomes discoverable by Git commands.

Reported-by: Mike Hommey <mh@xxxxxxxxxxxx>
Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
---
 builtin/clone.c            | 46 ++++++++++++++++++++++++++++++++++++++
 setup.c                    |  9 +++++++-
 t/t5801/git-remote-testgit |  5 +++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index bad1b70ce8..5d7f112125 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -926,6 +926,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	struct ref *mapped_refs = NULL;
 	const struct ref *ref;
 	struct strbuf key = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
 	struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
 	struct transport *transport = NULL;
 	const char *src_ref_prefix = "refs/heads/";
@@ -1125,6 +1126,50 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		git_dir = real_git_dir;
 	}
 
+	/*
+	 * We have a chicken-and-egg situation between initializing the refdb
+	 * and spawning transport helpers:
+	 *
+	 *   - Initializing the refdb requires us to know about the object
+	 *     format. We thus have to spawn the transport helper to learn
+	 *     about it.
+	 *
+	 *   - The transport helper may want to access the Git repository. But
+	 *     because the refdb has not been initialized, we don't have "HEAD"
+	 *     or "refs/". Thus, the helper cannot find the Git repository.
+	 *
+	 * Ideally, we would have structured the helper protocol such that it's
+	 * mandatory for the helper to first announce its capabilities without
+	 * yet assuming a fully initialized repository. Like that, we could
+	 * have added a "lazy-refdb-init" capability that announces whether the
+	 * helper is ready to handle not-yet-initialized refdbs. If any helper
+	 * didn't support them, we would have fully initialized the refdb with
+	 * the SHA1 object format, but later on bailed out if we found out that
+	 * the remote repository used a different object format.
+	 *
+	 * But we didn't, and thus we use the following workaround to partially
+	 * initialize the repository's refdb such that it can be discovered by
+	 * Git commands. To do so, we:
+	 *
+	 *   - Create an invalid HEAD ref pointing at "refs/heads/.invalid".
+	 *
+	 *   - Create the "refs/" directory.
+	 *
+	 *   - Set up the ref storage format and repository version as
+	 *     required.
+	 *
+	 * This is sufficient for Git commands to discover the Git directory.
+	 */
+	initialize_repository_version(GIT_HASH_UNKNOWN,
+				      the_repository->ref_storage_format, 1);
+
+	strbuf_addf(&buf, "%s/HEAD", git_dir);
+	write_file(buf.buf, "ref: refs/heads/.invalid");
+
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "%s/refs", git_dir);
+	safe_create_dir(buf.buf, 1);
+
 	/*
 	 * additional config can be injected with -c, make sure it's included
 	 * after init_db, which clears the entire config environment.
@@ -1453,6 +1498,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	free(remote_name);
 	strbuf_release(&reflog_msg);
 	strbuf_release(&branch_top);
+	strbuf_release(&buf);
 	strbuf_release(&key);
 	free_refs(mapped_refs);
 	free_refs(remote_head_points_at);
diff --git a/setup.c b/setup.c
index b69b1cbc2a..e3b76e84b5 100644
--- a/setup.c
+++ b/setup.c
@@ -1889,6 +1889,13 @@ void initialize_repository_version(int hash_algo,
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
 
+	/*
+	 * Note that we initialize the repository version to 1 when the ref
+	 * storage format is unknown. This is on purpose so that we can add the
+	 * correct object format to the config during git-clone(1). The format
+	 * version will get adjusted by git-clone(1) once it has learned about
+	 * the remote repository's format.
+	 */
 	if (hash_algo != GIT_HASH_SHA1 ||
 	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
 		repo_version = GIT_REPO_VERSION_READ;
@@ -1898,7 +1905,7 @@ void initialize_repository_version(int hash_algo,
 		  "%d", repo_version);
 	git_config_set("core.repositoryformatversion", repo_version_string);
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
 		git_config_set("extensions.objectformat",
 			       hash_algos[hash_algo].name);
 	else if (reinit)
diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
index 1544d6dc6b..bcfb358c51 100755
--- a/t/t5801/git-remote-testgit
+++ b/t/t5801/git-remote-testgit
@@ -12,6 +12,11 @@ url=$2
 
 dir="$GIT_DIR/testgit/$alias"
 
+if ! git rev-parse --is-inside-git-dir
+then
+	exit 1
+fi
+
 h_refspec="refs/heads/*:refs/testgit/$alias/heads/*"
 t_refspec="refs/tags/*:refs/testgit/$alias/tags/*"
 
-- 
2.44.0

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux