On Sun, Sep 25, 2016 at 05:10:31PM -0700, Junio C Hamano wrote: > Gustavo Grieco <gustavo.grieco@xxxxxxx> writes: > > > We found a stack read out-of-bounds parsing object files using git 2.10.0. It was tested on ArchLinux x86_64. To reproduce, first recompile git with ASAN support and then execute: > > > > $ git init ; mkdir -p .git/objects/b2 ; printf 'x' > .git/objects/b2/93584ddd61af21260be75ee9f73e9d53f08cd0 > > Interesting. If you prepare such a broken loose object file in your > local repository, I would expect that either unpack_sha1_header() or > unpack_sha1_header_to_strbuf() that sha1_loose_object_info() calls > would detect and barf by noticing that an error came from libz while > it attempts to inflate and would not even call parse_sha1_header. > > But it is nevertheless bad to assume that whatever happens to > inflate without an error must be formatted correctly to allow > parsing (i.e. has ' ' and then NUL termination within the first 32 > bytes after inflation), which is exactly what the hdr[32] is saying. Yeah. I also was surprised that we didn't barf on a zlib failure. But based on previous debugging of corrupted zlib data, my recollection is that there are a large number of weird corruptions that zlib will happily pass back and only later complain about a checksum error. So presumably "x" is one of those, and it might not hold for other corruptions (but I didn't try). > Note that this is totally unteseted and not thought through; I > briefly thought about what unpack_sha1_header_to_strbuf() does with > this change (it first lets unpack_sha1_header() to attempt with a > small buffer but it seems to discard the error code from it before > seeing if the returned buffer has NUL in it); there may be bad > interactions with it. Yeah, that seems wrong. I don't think it would involve an out of bounds read, but we probably could fail to correctly report zlib corruption. > diff --git a/sha1_file.c b/sha1_file.c > index 60ff21f..dfcbd76 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -1648,6 +1648,8 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, > > int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz) > { > + int status; > + > /* Get the data stream */ > memset(stream, 0, sizeof(*stream)); > stream->next_in = map; > @@ -1656,7 +1658,15 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma > stream->avail_out = bufsiz; > > git_inflate_init(stream); > - return git_inflate(stream, 0); > + status = git_inflate(stream, 0); > + if (status) > + return status; > + > + /* Make sure we got the terminating NUL for the object header */ > + if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer)) > + return -1; > + > + return 0; This doesn't look too invasive as an approach, though I would have done it differently. We're making the assumption that once there is a NUL, the header-parser won't do anything stupid, which creates a coupling between those two bits of code. My inclination would have been to just treat the header as a ptr/len pair, and make sure the parser never reads past the end. But I implemented that, and it _is_ rather invasive. And it's not like coupling unpack_sha1_header() and parse_sha1_header() is all that terrible; they are meant to be paired. I haven't read through your follow-up yet; I'll do that before posting my version. > static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map, > @@ -1758,6 +1768,8 @@ static int parse_sha1_header_extended(const char *hdr, struct object_info *oi, > char c = *hdr++; > if (c == ' ') > break; > + if (!c) > + die("invalid object header"); > type_len++; > } We keep reading from hdr after this, though I think those bits would all bail correctly on seeing NUL. -Peff