On Thu, Jul 13, 2017 at 05:27:44PM -0700, Shawn Pearce wrote: > > We _could_ consider gzipping individual blocks of > > a reftable (or any structure that allows you to search to a > > constant-sized block and do a linear search from there). But given that > > they're in the same ballpark, I'm happy with whatever ends up the > > simplest to code and debug. ;) > > This does help to shrink the file, e.g. it drops from 28M to 23M. > > It makes it more CPU costly to access a block, as we have to inflate > that to walk through the records. It also messes with alignment. When > you touch a block, that may be straddling two virtual memory pages in > your kernel/filesystem. > > I'm not sure those penalties are worth the additional 16% reduction in size. Yeah, I don't really care about a 16% reduction in size. I care much more about simplicity of implementation and debugging. Using zlib is kind-of simple to implement. But if you've ever had to debug it (or figure out what is going on with maybe-corrupted output), it's pretty nasty. So I don't mind a more readable custom compression if it's not too complicated. And especially if it buys us extra performance by being able to jump around non-sequentially in the block. -Peff