"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Han-Wen Nienhuys <hanwen@xxxxxxxxxx> > > Version appends a hash ID to the file header, making it slightly larger. > > This commit also changes "SHA-1" into "object ID" in many places. > > Signed-off-by: Han-Wen Nienhuys <hanwen@xxxxxxxxxx> > --- > Documentation/technical/reftable.txt | 79 ++++++++++++++++------------ > 1 file changed, 44 insertions(+), 35 deletions(-) > > diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt > index 6223538d64e..1464c4e7437 100644 > --- a/Documentation/technical/reftable.txt > +++ b/Documentation/technical/reftable.txt > @@ -29,7 +29,7 @@ Objectives > > * Near constant time lookup for any single reference, even when the > repository is cold and not in process or kernel cache. > -* Near constant time verification if a SHA-1 is referred to by at least > +* Near constant time verification if an object ID is referred to by at least > one reference (for allow-tip-sha1-in-want). Good. These are called "object names", though. > @@ -193,8 +193,8 @@ and non-aligned files. > Very small files (e.g. a single ref block) may omit `padding` and the ref > index to reduce total file size. > > -Header > -^^^^^^ > +Header (version 1) > +^^^^^^^^^^^^^^^^^^ > > A 24-byte header appears at the beginning of the file: > > @@ -215,6 +215,27 @@ used in a stack for link:#Update-transactions[transactions], these > fields can order the files such that the prior file’s > `max_update_index + 1` is the next file’s `min_update_index`. > > +Header (version 2) > +^^^^^^^^^^^^^^^^^^ > + > +A 28-byte header appears at the beginning of the file: > + > +.... > +'REFT' > +uint8( version_number = 2 ) > +uint24( block_size ) > +uint64( min_update_index ) > +uint64( max_update_index ) > +uint32( hash_id ) > +.... > + > +The header is identical to `version_number=1`, with the 4-byte hash ID > +("sha1" for SHA1 and "s256" for SHA-256) append to the header. > + > +For maximum backward compatibility, it is recommended to use version 1 when > +writing SHA1 reftables. > + > + > First ref block > ^^^^^^^^^^^^^^^ > > @@ -302,8 +323,8 @@ The `value` follows. Its format is determined by `value_type`, one of > the following: > > * `0x0`: deletion; no value data (see transactions, below) > -* `0x1`: one 20-byte object id; value of the ref > -* `0x2`: two 20-byte object ids; value of the ref, peeled target > +* `0x1`: one object id; value of the ref > +* `0x2`: two object ids; value of the ref, peeled target Ah, OK, I pointed out these future-proofing for the previous step, but as long as the end result is written in a hash-algorithm agnostic way, it is OK. Again these are called "object names", though. > * `0x3`: symbolic reference: `varint( target_len ) target` > @@ -434,7 +455,7 @@ works the same as in reference blocks. > > Because object identifiers are abbreviated by writers to the shortest > unique abbreviation within the reftable, obj key lengths are variable > -between 2 and 20 bytes. Readers must compare only for common prefix > +between 2 and 32 bytes. Readers must compare only for common prefix Is it allowed for a reftable file whose hash_id field says "sha1" to use more than 20 bytes of obj key? Phrasing it like "unique prefix of object name, no shorter than 2 bytes" would avoid the problem, I would think. This version also adds more ’ apostrophes, where we would prefer to place vanilla single quotes, which may need to be corrected in the conversion toolchain. I did not see any new typo introduced in this step. Thanks.