Jeff King <peff@xxxxxxxx> writes: > The problem isn't actually a sha1 mismatch, though that's what > parse_object() will report. The issue is actually that the file is > truncated. So zlib does not say "this is corrupt", but rather "I need > more bytes to keep going". And unfortunately it returns Z_BUF_ERROR both > for "I need more bytes" (in which we know we are truncated, because we > fed the whole mmap'd file in the first place) as well as "I need more > output buffer space" (which just means we should keep looping!). > > So we need to distinguish those cases. I think this is the simplest fix: > > diff --git a/sha1-file.c b/sha1-file.c > index dd0b6aa873..a7ff5fe25d 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream, > * see the comment in unpack_sha1_rest for details. > */ > while (total_read <= size && > + stream->avail_in > 0 && > (status == Z_OK || status == Z_BUF_ERROR)) { > stream->next_out = buf; > stream->avail_out = sizeof(buf); Hmph. If the last round consumed the final input byte and needed output space of N bytes, but only M (< N) bytes of the output space was available, then it would have reduced both avail_in and avail_out down to zero and yielded Z_BUF_ERROR, no? Or would zlib refrain from consuming that final byte (leaving avail_in to at least one) and give us Z_BUF_ERROR in such a case? > This works just by checking that we are making forward progress in the > output buffer. I think that would _probably_ be OK for this case, since > we know we have all of the input available. But in a case where we're > feeding the input in a stream, it would not be. It's possible there that > we would not create any output in one round, but would do so after > feeding more input bytes. Yes, exactly. > I think the patch I showed above addresses the root cause more directly. > I'll wrap that up in a real commit, but I think there may be some > related work: > > - "git show 19f9c827" does complain with "sha1 mismatch" (which isn't > strictly correct, but is probably good enough). However, "git > cat-file blob 19f9c827" exits non-zero without printing anything. It > probably should complain more loudly. > > - the offending loop comes from f6371f9210. But that commit was mostly > cargo-culting other parts of sha1-file.c. I'm worried that this bug > exists elsewhere, too. I'll dig around to see if I can find other > instances. Thanks.