Re: reftable: new ref storage format

Jeff King <peff@xxxxxxxx> · Fri, 14 Jul 2017 16:10:41 -0400

On Thu, Jul 13, 2017 at 05:27:44PM -0700, Shawn Pearce wrote:

> > We _could_ consider gzipping individual blocks of
> > a reftable (or any structure that allows you to search to a
> > constant-sized block and do a linear search from there). But given that
> > they're in the same ballpark, I'm happy with whatever ends up the
> > simplest to code and debug. ;)
> 
> This does help to shrink the file, e.g. it drops from 28M to 23M.
> 
> It makes it more CPU costly to access a block, as we have to inflate
> that to walk through the records. It also messes with alignment. When
> you touch a block, that may be straddling two virtual memory pages in
> your kernel/filesystem.
> 
> I'm not sure those penalties are worth the additional 16% reduction in size.

Yeah, I don't really care about a 16% reduction in size. I care much
more about simplicity of implementation and debugging. Using zlib is
kind-of simple to implement. But if you've ever had to debug it (or
figure out what is going on with maybe-corrupted output), it's pretty
nasty.

So I don't mind a more readable custom compression if it's not too
complicated. And especially if it buys us extra performance by being
able to jump around non-sequentially in the block.

-Peff