Re: corrupt object memory allocation error

Jeff King <peff@xxxxxxxx> · Wed, 20 Nov 2013 16:33:48 -0500

On Wed, Nov 20, 2013 at 04:33:50PM -0400, Joey Hess wrote:

> I've got a git repository of < 2 mb, where git wants to
> allocate a rather insane amount of memory:
> 
> >git fsck
> Checking object directories: 100% (256/256), done.
> fatal: Out of memory, malloc failed (tried to allocate 124865231165 bytes)
> 
> > git show 11644b5a075dc1425e01fbba51c045cea2d0c408
> fatal: Out of memory, malloc failed (tried to allocate 124865231165 bytes)
> 
> The problem seems to be the attached object file, which has gotten
> corrupted, presumably in the header that git reads to see how large it
> is. Thought I'd report this in case there is some easy way to
> add a sanity check.

Definitely a corrupt object. The start is not a valid zlib header, so we
guess that it is an "experimental loose object". This is a format that
git wrote for very short period as a performance experiment; it didn't
pan out and we no longer write it.

The loose object format contains the (purported) object size outside of
the checksum'd zlib data (whereas the normal format has a human-readable
header that gets zlib'd). Your corrupted bytes end up specifying a
ridiculously large size.

I wonder if it is time to drop reading support for the experimental
objects. It was never widely used, and was deprecated in v1.5.2 by
726f852 (deprecate the new loose object header format, 2007-05-09). That
would improve the case when the initial bytes of a loose object are
corrupted, because we would complain about the bogus zlib data before
trying to allocate the buffer.

The problem would still remain for packfiles, which use a similar
encoding, but I suspect it is less common there. For a single-byte
corruption, it is unlikely to be right in the length header. But for
absolute junk that is not git data at all, the first bytes are very
likely to be corrupted. In the pack case, we would notice early that it
does not look like a packfile; for the loose object, we have no such
header and proceed with the allocation.

As for your specific corruption, I can't make heads or tails of it. It
is not a single-bit error. The first two bytes of a loose object should
always be <0x78, 0x01>, which is the standard zlib deflate header. Your
bytes aren't even close, and decoding the rest with a corrupted zlib
header seems fruitless.

You don't happen to have another copy of the object (or of the data
contained in the object, such as the working tree file), do you? It
might be interesting to see a comparison of the bytes of the correct
data and your corruption.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html