Hi, this is the second version of my patch series to make git-clone(1) initialize the reference database with the correct hash format. This is a preparatory step for the reftable backend, which encodes the hash format on-disk and thus requires us to get the hash format right at the time of initialization. Changes compared to v1: - Patch 1: Extend the comment explaining why we always create the "refs/" directory. - Patch 3: Explain better why the patch is required in the first place. - Patch 4: Fix a typo. - Patch 7: Adapt a failing test to now assert the new behaviour. Thanks for your reviews! Patrick Patrick Steinhardt (7): setup: extract function to create the refdb setup: allow skipping creation of the refdb remote-curl: rediscover repository when fetching refs builtin/clone: fix bundle URIs with mismatching object formats builtin/clone: set up sparse checkout later builtin/clone: skip reading HEAD when retrieving remote builtin/clone: create the refdb with the correct object format builtin/clone.c | 65 ++++++++++---------- remote-curl.c | 7 ++- remote.c | 26 ++++---- remote.h | 1 + setup.c | 114 ++++++++++++++++++++++-------------- setup.h | 6 +- t/t5550-http-fetch-dumb.sh | 4 +- t/t5558-clone-bundle-uri.sh | 18 ++++++ 8 files changed, 150 insertions(+), 91 deletions(-) Range-diff against v1: 1: b69c57d272 ! 1: 2f34daa082 setup: extract function to create the refdb @@ Commit message calling `init_db()`. Extract the logic into a standalone function so that it becomes easier to do this refactoring. + While at it, expand the comment that explains why we always create the + "refs/" directory. + Signed-off-by: Patrick Steinhardt <ps@xxxxxx> ## setup.c ## @@ setup.c: void initialize_repository_version(int hash_algo, int reinit) + int reinit = is_reinit(); + + /* -+ * We need to create a "refs" dir in any case so that older -+ * versions of git can tell that this is a repository. ++ * We need to create a "refs" dir in any case so that older versions of ++ * Git can tell that this is a repository. This serves two main purposes: ++ * ++ * - Clients will know to stop walking the parent-directory chain when ++ * detecting the Git repository. Otherwise they may end up detecting ++ * a Git repository in a parent directory instead. ++ * ++ * - Instead of failing to detect a repository with unknown reference ++ * format altogether, old clients will print an error saying that ++ * they do not understand the reference format extension. + */ + safe_create_dir(git_path("refs"), 1); + adjust_shared_perm(git_path("refs")); 2: 090c52423e = 2: 40005ac1f1 setup: allow skipping creation of the refdb 3: a1b86a0cbb ! 3: 374a1c514b remote-curl: rediscover repository when fetching refs @@ Metadata ## Commit message ## remote-curl: rediscover repository when fetching refs - We're about to change git-clone(1) so that we set up the reference - database at a later point. This change will cause git-remote-curl(1) to - not detect the repository anymore due to "HEAD" not having been created - yet at the time it spawns, and thus causes it to error out once it is - asked to fetch the references. + The reftable format encodes the hash function used by the repository + inside of its tables. The reftable backend thus needs to be initialized + with the correct hash function right from the start, or otherwise we may + end up writing tables with the wrong hash function. But git-clone(1) + initializes the reference database before learning about the hash + function used by the remote repository, which has never been a problem + with the reffiles backend. + + To fix this, we'll have to change git-clone(1) to be more careful and + only create the reference backend once it learned about the remote hash + function. This creates a problem for git-remote-curl(1), which will then + be spawned at a time where the repository is not yet fully-initialized. + Consequentially, git-remote-curl(1) will fail to detect the repository, + which eventually causes it to error out once it is asked to fetch remote + objects. We can address this issue by trying to re-discover the Git repository in case none was detected at startup time. With this change, the clone will @@ Commit message repository 4. git-clone(1) creates the reference database as it has now learned - about the object format. + about the hash function. 5. git-clone(1) asks git-remote-curl(1) to fetch the remote packfile. The latter notices that it doesn't have a repository available, but 4: c7a9d6ef74 ! 4: 3bef564b57 builtin/clone: fix bundle URIs with mismatching object formats @@ Commit message is indeed not necessarily the case for the hash algorithm right now. So in practice it is the right thing to detect the remote's object format before downloading bundle URIs anyway, and not doing so causes clones - with bundle URIS to fail when the local default object format does not + with bundle URIs to fail when the local default object format does not match the remote repository's format. Unfortunately though, this creates a new issue: downloading bundles may 5: 703ff77eea = 5: 917f15055f builtin/clone: set up sparse checkout later 6: 6c919fb19c = 6: f3485a2708 builtin/clone: skip reading HEAD when retrieving remote 7: eb5530e6a8 ! 7: f062b11550 builtin/clone: create the refdb with the correct object format @@ Commit message upcoming reftable backend when cloning repositories with the SHA256 object format. + This change breaks a test in "t5550-http-fetch-dumb.sh" when cloning an + empty repository with `GIT_TEST_DEFAULT_HASH=sha256`. The test expects + the resulting hash format of the empty cloned repository to match the + default hash, but now we always end up with a sha1 repository. The + problem is that for dumb HTTP fetches, we have no easy way to figure out + the remote's hash function except for deriving it based on the hash + length of refs in `info/refs`. But as the remote repository is empty we + cannot rely on this detection mechanism. + + Before the change in this commit we already initialized the repository + with the default hash function and then left it as-is. With this patch + we always use the hash function detected via the remote, where we fall + back to "sha1" in case we cannot detect it. + + Neither the old nor the new behaviour are correct as we second-guess the + remote hash function in both cases. But given that this is a rather + unlikely edge case (we use the dumb HTTP protocol, the remote repository + uses SHA256 and the remote repository is empty), let's simply adapt the + test to assert the new behaviour. If we want to properly address this + edge case in the future we will have to extend the dumb HTTP protocol so + that we can properly detect the hash function for empty repositories. + Signed-off-by: Patrick Steinhardt <ps@xxxxxx> ## builtin/clone.c ## @@ setup.h: int init_db(const char *git_dir, const char *real_git_dir, /* * NOTE NOTE NOTE!! + + ## t/t5550-http-fetch-dumb.sh ## +@@ t/t5550-http-fetch-dumb.sh: test_expect_success 'create empty remote repository' ' + setup_post_update_server_info_hook "$HTTPD_DOCUMENT_ROOT_PATH/empty.git" + ' + +-test_expect_success 'empty dumb HTTP repository has default hash algorithm' ' ++test_expect_success 'empty dumb HTTP repository falls back to SHA1' ' + test_when_finished "rm -fr clone-empty" && + git clone $HTTPD_URL/dumb/empty.git clone-empty && + git -C clone-empty rev-parse --show-object-format >empty-format && +- test "$(cat empty-format)" = "$(test_oid algo)" ++ test "$(cat empty-format)" = sha1 + ' + + setup_askpass_helper base-commit: 1a87c842ece327d03d08096395969aca5e0a6996 -- 2.43.GIT
Attachment:
signature.asc
Description: PGP signature