Shawn Pearce <spearce@xxxxxxxxxxx> writes: > This is an updated draft after discussion on list with Peff, Michael > Haggerty, and Dave Borowitz. > > You can read a rendered version of this here: > https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md > > Biggest changes from the first proposal are: > > - reflog is now integrated into reftable > - block headers slightly different > - Peff's stack management idea is used > - Michael's compaction idea is used Just a few comments. > A variable number of 4-byte `restart_offset` values follows the > records. Offsets are relative to the start of the block (0 in first > block to include file header) and refer to the first byte of any > `ref_record` whose name has not been prefixed compressed. Readers can > start linear scans from any of these records. It is unclear what "0 in first block to include file header" wants to say. Do I write "8" if I want to express the offset of the first record in the first block, or do I write "0"? > The 2-byte `number_of_restarts + 1` stores the number of entries in > the `restart_offset` list. It is unclear whose responsibility it is to add this "1". Does this mean a reader thinks there is one entry in the restart table when it sees "0" in the number_of_restarts field (hence you can have max 65536 entries in total)? > Readers can use the restart count to binary search between restarts > before starting a linear scan. The `number_of_restarts` field must be > the last 2 bytes of the block as specified by `block_len` from the > block header. Does the new term "restart count" mean the same thing as number_of_restarts? > ### Log block format > > A log block is written as: > > 'g' > uint24( block_len ) > zlib_deflate { > log_record* > int32( restart_offset )* > int16( number_of_restarts ) > } > > Log blocks look similar to ref blocks, except `block_type = 'g'`. Is there a general recommended strategy for writers to choose how many entries to include in a single physical block? I understand that the deflated whole must fit in the physical block whose length is defined in the footer of the whole file, and in general you would not know how small the data deflates down to before compressing, right? > Log record keys are structured as: > > ref_name '\0' reverse_int32( time_sec ) > > where `time_sec` is the update time in seconds since the epoch. The > `reverse_int32` function inverses the value so lexographical ordering > the network byte order time sorts more recent records first: > > reverse_int(int32 t) { > return 0xffffffff - t; > } Is 2038 an issue, or by that time we'd all be retired together with this file format and it won't be our problem? As the file format uses delta compression with restarts, a reader needs to sequencially scan some bounded number of entries to get the contents of a specific entry anyway, so I am wondering if it is worth storing a longer timestamp in varint() for an restart entry and express the timestamp on delta entries as difference to the previous entry. > ### Log index > > The log index stores the log key (`refname \0 reverse_int32(time_sec)`) > for the last log record of every log block in the file, supporting > bounded-time lookup. This assumes that timestamps never wildly rewind in the reflog, doesn't it? Is that a sensible assumption? Or does "the last log record" in the above mean "the log record with largest timestamp? ... ah, not that is still not sufficient. You'd still need to assume that timestamps on entries in one log block must be all older than the ones on entries in later log blocks. Hmph... Also it is not clear to me how reflogs for two refs would be intermixed in the log blocks, and what log keys for the entries must be recorded in the log index, to facilitate efficient lookup. Is it assumed that a consecutive sequence of log blocks record reflogs for the same ref, before the sequence of log blocks switch to record reflogs for another ref, or something? > A log index block must be written if 2 or more log blocks are written > to the file. If present, the log index appears after the first log > block. There is no padding used to align the log index to block > alignment. > > Log index format is identical to ref index, except the keys are 5 > bytes longer to include `'\0'` and the 4-byte `reverse_int32(time)`. > Records use `block_offset` to refer to the start of a log block. I am assuming that we do not care about being able to quickly determine master@{24028}; I would imagine that it wouldn't be too hard to add an index to help such query, but I offhand would not know the details until I figure out how the format handles reflog entries for multiple refs first.