Re: Bug report: v2.47.0 cannot fetch version 1 pack indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 20, 2024 at 01:22:34AM +0200, fox wrote:

> Running git-bisect identifies b1b8dfde6929ec9463eca0a858c4adb9786d7c93
> as the first bad commit, suggesting that the .idx file downloaded from
> the remote is now expected to be byte-for-byte identical with a
> locally-generated version; due to format differences, they are not.
> 
> The remote idx is in the original (version 1) format, and git
> verify-pack seems satisfied with it.  Did v2.47.0 intend to block
> fetching such indices?

Interesting case. No, we did not intend to block fetching those indices.

We knew there was a potential problem with dumb http here, but the idea
was that if we had a local file "foo.idx" (or "foo.pack"), that we'd
never try to download the other side in the first place, preferring
ours.

And AFAICT that does hold. But here we have the opposite problem! We
download the pack idx from the remote first, so that we know what it
claims to have (which tells us whether we want to grab the matching
pack). And then we grab the matching pack, but instead of using the .idx
we downloaded, we index it again ourselves.

Which is probably a good idea to make sure that we prefer our
locally-checked copy versus what the other side has. But obviously there
is no guarantee that we'll come up with the same bytes as the other side
if there are version or configuration differences.

One option is to just discard the remote version before re-indexing,
like this:

diff --git a/http.c b/http.c
index d59e59f66b..6879d7ba1b 100644
--- a/http.c
+++ b/http.c
@@ -2500,6 +2500,7 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	struct child_process ip = CHILD_PROCESS_INIT;
 	int tmpfile_fd;
 	int ret = 0;
+	size_t len;
 
 	fclose(preq->packfile);
 	preq->packfile = NULL;
@@ -2517,6 +2518,18 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	else
 		ip.no_stdout = 1;
 
+	/*
+	 * We're about to reindex the packfile, so throw away the .idx
+	 * we downloaded from the remote in order to prefer ours.
+	 */
+	if (strip_suffix(preq->tmpfile.buf, ".pack.temp", &len)) {
+		struct strbuf idx = STRBUF_INIT;
+		strbuf_add(&idx, preq->tmpfile.buf, len);
+		strbuf_addstr(&idx, ".idx");
+		unlink(idx.buf);
+		strbuf_release(&idx);
+	}
+
 	if (run_command(&ip)) {
 		ret = -1;
 		goto cleanup;


There are a few things I don't like there, though:

  - obviously reversing the tempfile name back to the idx is hacky. We
    could probably store the original idx that led us to this pack and
    use that.

  - I don't _think_ there is any case where that .idx file is precious.
    We wouldn't be indexing the .pack file if we didn't just download
    it, and we wouldn't have downloaded it if we already had a
    .idx/.pack pair. OTOH the name we got from the other side isn't
    necessarily the same one we'll use locally; we're feeding the pack
    via "index-pack --stdin", so it will do its own hash to come up with
    the name. The other side could have used a different scheme, or even
    be trying to intentionally trick us.

So I feel like the root of the problem is that we moved the other side's
"pack-1234abcd.idx" into place at all! We do not plan to use it, but
only need to access it via our custom packed_git structs to see whether
we want to download the matching pack. And in fact I'd suspect there are
other possible bugs/trickery lurking, if the other side gives us a
broken .idx that does not match its pack (our odb would now have the
broken file with no matching pack).

So IMHO a cleaner fix is that we should keep the stuff we download from
the remote marked as temporary files, and then throw them away when we
complete the fetch (rather than just expecting index-pack to overwrite
them).

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux