On Sat, Dec 09, 2023 at 07:16:30PM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@xxxxxx> writes: > > > While at it I noticed that this actually fixes a bug with bundle URIs > > when the object formats diverge in this way. > > ... > > This patch series is actually the last incompatibility for the reftable > > backend that I have found. All tests except for the files-backend > > specific ones pass now with the current state I have at [1], which is > > currently at e6f2f592b7 (t: skip tests which are incompatible with > > reftable, 2023-11-24) > > An existing test > > $ make && cd t && GIT_TEST_DEFAULT_HASH=sha256 sh t5550-http-fetch-dumb.sh > > passes with vanilla Git 2.43, but with these patches applied, it > fails the "7 - empty dumb HTTP" step. Indeed -- now that the GitLab CI definitions have landed in your master branch I can also see this failure as part of our CI job [1]. And with the NixOS httpd refactorings I'm actually able to run these tests in the first place. Really happy to see that things come together like this, as it means that we'll detect such issues before we send the series to the mailing list from now on. Anyway, regarding the test itself. I think the refactorings in this patch series uncover a preexisting and already-known issue with empty repositories when using the dumb HTTP protocol: there is no proper way to know about the hash function that the remote repository uses [2]. Before my refactorings we used to fall back to the local default hash format with which we've already initialized the repository, which is wrong. Now we use the hash format we detected via the remote, which we cannot detect because the remote is empty and does not advertise the hash function, so we fall back to SHA1 and thus also do the wrong thing. The only correct thing here would be to use the actual hash function that the remote repository uses, but we have no to do so. So we're kind of stuck here and can only choose between two different wrong ways to pick the hash function. We can restore the previous wrong behaviour by honoring GIT_DEFAULT_HASH in git-remote-curl(1) in the case where we do not have a repository set up yet. So something similar in spirit to the following: ``` diff --git a/remote-curl.c b/remote-curl.c index fc29757b65..7e97c9c2e9 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -27,6 +27,7 @@ static struct remote *remote; /* always ends with a trailing slash */ static struct strbuf url = STRBUF_INIT; +static int nongit; struct options { int verbosity; @@ -275,8 +276,30 @@ static const struct git_hash_algo *detect_hash_algo(struct discovery *heads) { const char *p = memchr(heads->buf, '\t', heads->len); int algo; - if (!p) - return the_hash_algo; + + if (!p) { + const char *env; + + if (!nongit) + return the_hash_algo; + + /* + * In case the remote neither advertises the hash format nor + * any references we have no way to detect the correct hash + * format. We can thus only guess what the remote is using, + * where the best guess is to fall back to the default hash. + */ + env = getenv("GIT_DEFAULT_HASH"); + if (env) { + algo = hash_algo_by_name(env); + if (algo == GIT_HASH_UNKNOWN) + die(_("unknown hash algorithm '%s'"), env); + } else { + algo = GIT_HASH_SHA1; + } + + return &hash_algos[algo]; + } algo = hash_algo_by_length((p - heads->buf) / 2); if (algo == GIT_HASH_UNKNOWN) @@ -1521,7 +1544,6 @@ static int stateless_connect(const char *service_name) int cmd_main(int argc, const char **argv) { struct strbuf buf = STRBUF_INIT; - int nongit; int ret = 1; setup_git_directory_gently(&nongit); ``` +Cc brian, as he's the author of [2]. Patrick [1]: https://gitlab.com/gitlab-org/git/-/jobs/5723052108 [2]: ac093d0790 (remote-curl: detect algorithm for dumb HTTP by size, 2020-06-19)
Attachment:
signature.asc
Description: PGP signature