On Tue, Aug 15, 2017 at 11:15 PM, Stefan Beller <sbeller@xxxxxxxxxx> wrote: > On Tue, Aug 15, 2017 at 7:48 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: >> 7th iteration of the reftable storage format. >> >> You can read a rendered version of this here: >> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md >> >> Changes from v6: >> - Blocks are variable sized, and alignment is optional. >> - ref index is required on variable sized multi-block files. >> >> - restart_count/offsets are again at the end of the block. >> - value_type = 0x3 is only for symbolic references. >> - "other" files cannot be stored in reftable. >> >> - object blocks are explicitly optional. >> - object blocks use position (offset in bytes), not block id. >> - removed complex log_chained format for log blocks >> >> - Layout uses log, ref file extensions >> - Described reader algorithm to obtain a snapshot > > - back to the old "intra-block index is last" > for all block types. ok. Yes, it simplifies "streaming writers" who don't want to buffer a lot. > - changed (only ref?) indexes to start char + 3 byte size: > Which starting char do object/log indexes have? All index blocks use 'i'. > "Unaligned files must include the ref index to support fast lookup." > > Why this? I would imagine the client (which has ~5 branches), > would not need this, but only a ref block, that's it. The quoted part is I think incomplete. Unaligned files need the ref index if there is more than one ref block, as there is no way to divide the space for binary search. A single ref block with 5 branches does not need the ref index. > Ctrl-F for 'block_size' reveals nothing is computed > relative to the block_size in this format, yet we can > set it to an arbitrary number. If following the spec, > the reader at $DAY_JOB needs to be able to read > both aligned and unaligned reftables, despite our plan > to ever write aligned ref tables, what would the reader > use the block_size for? (I think we can omit that field > from the header/footer now, no?) Its really helpful to be present for the reader to know how to locate and read blocks. If the ref index is missing and there are multiple ref blocks in an aligned file, a reader can use block_size to divide the space and perform binary search. Even when the ref index is present, the reader can use block_size to issue a disk IO read of block_size bytes without reading the block_len of the target block first. At $DAY_JOB the block_size is tunable by the writer and could change at any time, so its useful to have it embedded in the output.