On Sun, Dec 16, 2018 at 04:14:46PM -0800, Jonathan Nieder wrote: > Hi, > > Farhan Khan wrote: > >> Farhan Khan wrote: > > >>> I am having trouble figuring out the boundary between two objects in > >>> the pack file. > [...] > > I think the issue is, the compressed object has a fixed > > size and git inflates it, then moves on to the next object. I am > > trying to figure out how where it identifies the size of the object. > > Do you mean the compressed size or uncompressed size? > > It sounds to me like pack-format.txt needs to do a better job of > distinguishing the two. How about something like this? I mostly wrote this based on memory (and a very quick look at index-pack) but I think we never ever really stored compressed sizes. The "length" field (even in loose format) is always about uncompressed size. -- 8< -- diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index cab5bdd2ff..4fd49f61d6 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -31,6 +31,11 @@ Git pack format is an OBJ_OFS_DELTA object compressed delta data + Note: The length (in bytes) is of uncompressed objects or + deltified representation. We're supposed to reach the end of zlib + stream once we have inflated the given length, otherwise it's a + corrupted pack file. + Observation: length of each object is encoded in a variable length format and is not constrained to 32-bit or anything. @@ -199,7 +204,8 @@ Pack file entry: <+ is the size before compression). If it is REF_DELTA, then 20-byte base object name SHA-1 (the size above is the - size of the delta data that follows). + size of the delta data that follows, before + compression). delta data, deflated. If it is OFS_DELTA, then n-byte offset (see below) interpreted as a negative -- 8< --