Re: [WIP v2 0/2] Modifying pack objects to support --blob-max-bytes

Jeff King <peff@xxxxxxxx> · Wed, 7 Jun 2017 05:46:41 -0400

On Mon, Jun 05, 2017 at 10:35:23AM -0700, Jonathan Tan wrote:

> > The rest of the pack code uses a varint encoding which is generally
> > much smaller than a uint64 for most files, but can handle arbitrary
> > sizes.
> > 
> > The one thing it loses is that you wouldn't have a fixed-size record, so
> > if you were planning to dump this directly to disk and binary-search it,
> > that won't work. OTOH, you could make pseudo-pack-entries and just
> > index them along with the rest of the objects in the pack .idx.
> > 
> > The one subtle thing there is that the pseudo-entry would have to say
> > "this is my sha1". And then we'd end up repeating that sha1 in the .idx
> > file. So it's optimal on the network but wastes 20 bytes on disk (unless
> > index-pack throws away the in-pack sha1s as it indexes, which is
> > certainly an option).
> 
> If we end up going with the varint approach (which seems reasonable),
> maybe the client could just expand the varints into uint64s so that it
> has a binary-searchable file. I think it's better to keep this list
> separate from the pack .idx file (there has been some discussion on this
> - [1] and its replies).
> 
> [1] https://public-inbox.org/git/777ab8f2-c31a-d07b-ffe3-f8333f408ea1@xxxxxxxxxxxxxxxxx/

OK. If we're keeping it separate anyway, then I agree that just
expanding the varints is a good solution. And we don't have to care too
much about the local storage, because it can be changed out later
without touching the on-the-wire protocol.

-Peff