Jeff King <peff@xxxxxxxx> writes: > [1] I briefly ran your case under valgrind and got: > > ==5409== Invalid read of size 4 > ==5409== at 0x55F92A: nth_packed_object_offset (sha1_file.c:2464) > ==5409== by 0x55FBD4: find_pack_entry_one (sha1_file.c:2523) > ==5409== by 0x55FCF7: fill_pack_entry (sha1_file.c:2566) > ==5409== by 0x55FDE2: find_pack_entry (sha1_file.c:2604) > ==5409== by 0x5615BA: has_sha1_file_with_flags (sha1_file.c:3212) > ==5409== by 0x50FC38: has_sha1_file (cache.h:1049) > ==5409== by 0x51043B: parse_object (object.c:259) > ==5409== by 0x546BF9: get_reference (revision.c:254) > ==5409== by 0x54CA15: setup_revisions (revision.c:2342) > ==5409== by 0x4531DA: cmd_log_init_finish (log.c:156) > ==5409== by 0x453465: cmd_log_init (log.c:211) > ==5409== by 0x4547EB: cmd_log (log.c:672) > ==5409== Address 0x840244bc is not stack'd, malloc'd or (recently) free'd > > So I'd guess it's not the pack itself, but rather the .idx which is > full of nonsense values. And that's always generated from scratch > locally. After I seeing your "it's not the pack itself but rather the .idx" without looking at (rather, conciously avoiding to look at) the valgrind trace, I checked the codepath that starts from read_packed_sha1(). When we map in the .idx file, we do minimum sanity checks to make sure that .idx file itself has sorted fan-out. We do not check if the object names are sorted, so a bogus .idx could tell us that an object does not exist when it exists in the matching .pack, but that is harmless. Also an unsorted object names will not make our binary search run in circles while looking things up. We do not check if the offset of individual objects are within the corresponding .pack file, either. And I arrived at nth_packed_object_offset(), so I think I am hitting the right nail. There are a few values we read from the .idx file here that we would want to validate for sanity. off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n) { const unsigned char *index = p->index_data; index += 4 * 256; if (p->index_version == 1) { return ntohl(*((uint32_t *)(index + 24 * n))); This can return an offset that goes beyond the end of the packfile, possibly leading to a read access of unmapped region. } else { uint32_t off; index += 8 + p->num_objects * (20 + 4); off = ntohl(*((uint32_t *)(index + 4 * n))); if (!(off & 0x80000000)) return off; This also can return an offset that goes beyond the end of the packfile. Otherwise 'off' at this point gives an offset into the .idx file itself, that is the location of the 8-byte offset into the packfile. index += p->num_objects * 4 + (off & 0x7fffffff) * 8; And this computation can take us beyond the end of the .idx return (((uint64_t)ntohl(*((uint32_t *)(index + 0)))) << 32) | ntohl(*((uint32_t *)(index + 4))); Or the value we read with this can take us beyond the end of the packfile. } } As long as the returned value is within the valid region of packfile, packed_object_info() and unpack_entry() should be responsible for ensuring that we do not read past the end of the data. They should both begin by calling use_pack(), which already has the "offset beyond end of packfile" check, so I think that they would successfully catch a malicious .idx file that has handcrafted bad offsets if we fixed nth_packed_object_offset(). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html