I'm getting a 'serious inflate consistency' error while running "git verify-pack" (actually, "git index-pack --verify" under the hood). It bisects to 4614043 (index-pack: use streaming interface for collision test on large blobs, 2012-05-24). The interesting thing about this repository is that it has a 2.8G text file in it which compresses down to only about 420M. I'm not sure that 4614043 actually introduces the bug, but rather just triggers the code path. I'm able to reproduce it with the following script: # empty repo... git init repo && cd repo && # set this low to make sure we follow the unpack_data code-path git config core.bigfilethreshold 100k && # now make a file bigger than our threshold, but that will compress # well perl -le 'print for (1..100000)' >file && # and then make a commit git add file && git commit -m file && # and a pack with it git repack -ad && # and then verify that pack git verify-pack .git/objects/pack/*.pack The problem seems to be in index-pack.c:unpack_data, which does this: > git_inflate_init(&stream); > stream.next_out = data; > stream.avail_out = consume ? 64*1024 : obj->size; > > do { > unsigned char *last_out = stream.next_out; > ssize_t n = (len < 64*1024) ? len : 64*1024; > n = pread(pack_fd, inbuf, n, from); > if (n < 0) > die_errno(_("cannot pread pack file")); > if (!n) > die(Q_("premature end of pack file, %lu byte missing", > "premature end of pack file, %lu bytes missing", > len), > len); > from += n; > len -= n; > stream.next_in = inbuf; > stream.avail_in = n; > status = git_inflate(&stream, 0); > if (consume) { > if (consume(last_out, stream.next_out - last_out, cb_data)) { > free(inbuf); > free(data); > return NULL; > } > stream.next_out = data; > stream.avail_out = 64*1024; > } > } while (len && status == Z_OK && !stream.avail_in); > > /* This has been inflated OK when first encountered, so... */ > if (status != Z_STREAM_END || stream.total_out != obj->size) > die(_("serious inflate inconsistency")); We limit ourselves to handling just 64K at a time. So we read in 64K and stuff it in the next_in/avail_in buffer. And then we make 64K of buffer available for zlib to write into via the next_out/avail_out buffer. So zlib reads the first chunk, and after reading 28K or so fills up the 64K output buffer and returns. We call consume on the chunk, but when we hit the outer loop condition, stream.avail_in still mentions the 36K we haven't processed yet, and the loop ends with status == Z_OK, which triggers the assertion below it. So I don't really understand what this !stream.avail_in is doing there in the do-while loop. Don't we instead need to have an inner loop that keeps feeding the result of pread into git_inflate until we don't have any available data left? Something like the patch below, which seems to work for me, but I still don't understand the function of the !stream.avail_in check in the outer loop. -Peff diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 8b5c1eb..0db1923 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -538,15 +538,19 @@ static void *unpack_data(struct object_entry *obj, len -= n; stream.next_in = inbuf; stream.avail_in = n; - status = git_inflate(&stream, 0); - if (consume) { - if (consume(last_out, stream.next_out - last_out, cb_data)) { - free(inbuf); - free(data); - return NULL; - } - stream.next_out = data; - stream.avail_out = 64*1024; + if (!consume) + status = git_inflate(&stream, 0); + else { + do { + status = git_inflate(&stream, 0); + if (consume(last_out, stream.next_out - last_out, cb_data)) { + free(inbuf); + free(data); + return NULL; + } + stream.next_out = data; + stream.avail_out = 64*1024; + } while (status == Z_OK && stream.avail_in); } } while (len && status == Z_OK && !stream.avail_in); -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html