On Sun, Oct 20, 2024 at 01:22:34AM +0200, fox wrote: > Running git-bisect identifies b1b8dfde6929ec9463eca0a858c4adb9786d7c93 > as the first bad commit, suggesting that the .idx file downloaded from > the remote is now expected to be byte-for-byte identical with a > locally-generated version; due to format differences, they are not. > > The remote idx is in the original (version 1) format, and git > verify-pack seems satisfied with it. Did v2.47.0 intend to block > fetching such indices? Interesting case. No, we did not intend to block fetching those indices. We knew there was a potential problem with dumb http here, but the idea was that if we had a local file "foo.idx" (or "foo.pack"), that we'd never try to download the other side in the first place, preferring ours. And AFAICT that does hold. But here we have the opposite problem! We download the pack idx from the remote first, so that we know what it claims to have (which tells us whether we want to grab the matching pack). And then we grab the matching pack, but instead of using the .idx we downloaded, we index it again ourselves. Which is probably a good idea to make sure that we prefer our locally-checked copy versus what the other side has. But obviously there is no guarantee that we'll come up with the same bytes as the other side if there are version or configuration differences. One option is to just discard the remote version before re-indexing, like this: diff --git a/http.c b/http.c index d59e59f66b..6879d7ba1b 100644 --- a/http.c +++ b/http.c @@ -2500,6 +2500,7 @@ int finish_http_pack_request(struct http_pack_request *preq) struct child_process ip = CHILD_PROCESS_INIT; int tmpfile_fd; int ret = 0; + size_t len; fclose(preq->packfile); preq->packfile = NULL; @@ -2517,6 +2518,18 @@ int finish_http_pack_request(struct http_pack_request *preq) else ip.no_stdout = 1; + /* + * We're about to reindex the packfile, so throw away the .idx + * we downloaded from the remote in order to prefer ours. + */ + if (strip_suffix(preq->tmpfile.buf, ".pack.temp", &len)) { + struct strbuf idx = STRBUF_INIT; + strbuf_add(&idx, preq->tmpfile.buf, len); + strbuf_addstr(&idx, ".idx"); + unlink(idx.buf); + strbuf_release(&idx); + } + if (run_command(&ip)) { ret = -1; goto cleanup; There are a few things I don't like there, though: - obviously reversing the tempfile name back to the idx is hacky. We could probably store the original idx that led us to this pack and use that. - I don't _think_ there is any case where that .idx file is precious. We wouldn't be indexing the .pack file if we didn't just download it, and we wouldn't have downloaded it if we already had a .idx/.pack pair. OTOH the name we got from the other side isn't necessarily the same one we'll use locally; we're feeding the pack via "index-pack --stdin", so it will do its own hash to come up with the name. The other side could have used a different scheme, or even be trying to intentionally trick us. So I feel like the root of the problem is that we moved the other side's "pack-1234abcd.idx" into place at all! We do not plan to use it, but only need to access it via our custom packed_git structs to see whether we want to download the matching pack. And in fact I'd suspect there are other possible bugs/trickery lurking, if the other side gives us a broken .idx that does not match its pack (our odb would now have the broken file with no matching pack). So IMHO a cleaner fix is that we should keep the stuff we download from the remote marked as temporary files, and then throw them away when we complete the fetch (rather than just expecting index-pack to overwrite them). -Peff