Re: Confused over packfile and index design

"Steven E. Harris" <seh@xxxxxxxxx> · Sun, 10 Apr 2011 16:10:46 -0400

Nicolas Pitre <nico@xxxxxxxxxxx> writes:

> So the idea is to do that once to construct the pack index and allow
> for random access once the index is available.  Accessing a particular
> object without the pack index would be extremely costly otherwise,
> especially if it is towards the end of the pack.

Thanks for the explanation. It's clear now.

> The reason for storing only the expanded data size is to have the
> exact buffer size allocated for the inflated data.  The zlib stream
> that follows is encoded to consume only the needed data to produce the
> inflated object.  When the output buffer is all used, the zlib library
> should flag the end of the deflated stream.  If not then there is an
> error in the pack data.

That provides some error checking, then, as we trust zlib to know when
it's had enough input, and we have to trust its assessment on how much
is enough, given the lack of delimiting or framing in the packfile
format.

By the way, I looked over the zlib manualÂ, and I see that many of the
inflating/decompressing functions require the caller to specify the
number of input bytes available. There is inflateBack() that uses
callback functions to request more data upon underflow. The higher-level
inflate() function also looks like it can be called in a loop, refilling
the input buffer upon underflow. Is Git using one of these two functions
here?

[...]

> When in doubt, the code is always the ultimate source of information.

Yes, I need to learn my way around in there to find the call sites
relevant to this discussion.

Footnotes: 
Â http://www.zlib.net/manual.html

-- 
Steven E. Harris

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html