Re: [PATCH v13 06/13] reftable: define version 2 of the spec to accomodate SHA256

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 19 May 2020 15:32:24 -0700

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Han-Wen Nienhuys <hanwen@xxxxxxxxxx>
>
> Version appends a hash ID to the file header, making it slightly larger.
>
> This commit also changes "SHA-1" into "object ID" in many places.
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@xxxxxxxxxx>
> ---
>  Documentation/technical/reftable.txt | 79 ++++++++++++++++------------
>  1 file changed, 44 insertions(+), 35 deletions(-)
>
> diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
> index 6223538d64e..1464c4e7437 100644
> --- a/Documentation/technical/reftable.txt
> +++ b/Documentation/technical/reftable.txt
> @@ -29,7 +29,7 @@ Objectives
>  
>  * Near constant time lookup for any single reference, even when the
>  repository is cold and not in process or kernel cache.
> -* Near constant time verification if a SHA-1 is referred to by at least
> +* Near constant time verification if an object ID is referred to by at least
>  one reference (for allow-tip-sha1-in-want).

Good.  These are called "object names", though.

> @@ -193,8 +193,8 @@ and non-aligned files.
>  Very small files (e.g. a single ref block) may omit `padding` and the ref
>  index to reduce total file size.
>  
> -Header
> -^^^^^^
> +Header (version 1)
> +^^^^^^^^^^^^^^^^^^
>  
>  A 24-byte header appears at the beginning of the file:
>  
> @@ -215,6 +215,27 @@ used in a stack for link:#Update-transactions[transactions], these
>  fields can order the files such that the prior file’s
>  `max_update_index + 1` is the next file’s `min_update_index`.
>  
> +Header (version 2)
> +^^^^^^^^^^^^^^^^^^
> +
> +A 28-byte header appears at the beginning of the file:
> +
> +....
> +'REFT'
> +uint8( version_number = 2 )
> +uint24( block_size )
> +uint64( min_update_index )
> +uint64( max_update_index )
> +uint32( hash_id )
> +....
> +
> +The header is identical to `version_number=1`, with the 4-byte hash ID
> +("sha1" for SHA1 and "s256" for SHA-256) append to the header.
> +
> +For maximum backward compatibility, it is recommended to use version 1 when
> +writing SHA1 reftables.
> +
> +
>  First ref block
>  ^^^^^^^^^^^^^^^
>  
> @@ -302,8 +323,8 @@ The `value` follows. Its format is determined by `value_type`, one of
>  the following:
>  
>  * `0x0`: deletion; no value data (see transactions, below)
> -* `0x1`: one 20-byte object id; value of the ref
> -* `0x2`: two 20-byte object ids; value of the ref, peeled target
> +* `0x1`: one object id; value of the ref
> +* `0x2`: two object ids; value of the ref, peeled target

Ah, OK, I pointed out these future-proofing for the previous step,
but as long as the end result is written in a hash-algorithm
agnostic way, it is OK.  Again these are called "object names",
though.

>  * `0x3`: symbolic reference: `varint( target_len ) target`

> @@ -434,7 +455,7 @@ works the same as in reference blocks.
>  
>  Because object identifiers are abbreviated by writers to the shortest
>  unique abbreviation within the reftable, obj key lengths are variable
> -between 2 and 20 bytes. Readers must compare only for common prefix
> +between 2 and 32 bytes. Readers must compare only for common prefix

Is it allowed for a reftable file whose hash_id field says "sha1" to
use more than 20 bytes of obj key?  Phrasing it like "unique prefix
of object name, no shorter than 2 bytes" would avoid the problem, I
would think.

This version also adds more

	’

apostrophes, where we would prefer to place vanilla single quotes,
which may need to be corrected in the conversion toolchain.

I did not see any new typo introduced in this step.

Thanks.