Re: [PATCH v12 06/12] reftable: define version 2 of the spec to accomodate SHA256

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 08 May 2020 12:59:55 -0700

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Han-Wen Nienhuys <hanwen@xxxxxxxxxx>
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@xxxxxxxxxx>
> ---
>  Documentation/technical/reftable.txt | 50 ++++++++++++++++------------
>  1 file changed, 28 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
> index 9fa4657d9ff..ee3f36ea851 100644
> --- a/Documentation/technical/reftable.txt
> +++ b/Documentation/technical/reftable.txt
> @@ -193,8 +193,8 @@ and non-aligned files.
>  Very small files (e.g. 1 only ref block) may omit `padding` and the ref

Hmph, I am seeing nbsp before '1' and am wondering where it came from.

>  index to reduce total file size.
>  
> -Header
> -^^^^^^
> +Header (version 1)
> +^^^^^^^^^^^^^^^^^^
>  
>  A 24-byte header appears at the beginning of the file:
>  
> @@ -215,6 +215,24 @@ used in a stack for link:#Update-transactions[transactions], these
>  fields can order the files such that the prior file’s
>  `max_update_index + 1` is the next file’s `min_update_index`.

Am I correct to assume that we do not plan to support a repository
with mixed set of refs, some referring to a commit with its SHA-1
object name while others using SHA-256 object name?

> +Header (version 2)
> +^^^^^^^^^^^^^^^^^^
> +
> +A 28-byte header appears at the beginning of the file:
> +
> +....
> +'REFT'
> +uint8( version_number = 1 )

Shouldn't this be 2 instead, as v1 lacked the Hash-id field?

> +uint24( block_size )
> +uint64( min_update_index )
> +uint64( max_update_index )
> +uint32( hash_id )
> +....
> +
> +The header is identical to `version_number=1`, with the 4-byte hash ID
> +("sha1" for SHA1 and "s256" for SHA-256) append to the header.

Am I correct to assume that SHA-1 repositories are encouraged to use
version 2 when the code becomes available?

>  First ref block
>  ^^^^^^^^^^^^^^^
>  
> @@ -671,14 +689,8 @@ Footer
>  After the last block of the file, a file footer is written. It begins
>  like the file header, but is extended with additional data.
>  
> -A 68-byte footer appears at the end:
> -
>  ....
> -    'REFT'
> -    uint8( version_number = 1 )
> -    uint24( block_size )
> -    uint64( min_update_index )
> -    uint64( max_update_index )
> +    HEADER
>  
>      uint64( ref_index_position )
>      uint64( (obj_position << 5) | obj_id_len )
> @@ -701,12 +713,16 @@ obj blocks.
>  * `obj_index_position`: byte position for the start of the obj index.
>  * `log_index_position`: byte position for the start of the log index.
>  
> +The size of the footer is 68 bytes for version 1, and 72 bytes for
> +version 2.
> +
>  Reading the footer
>  ++++++++++++++++++
>  
> -Readers must seek to `file_length - 68` to access the footer. A trusted
> -external source (such as `stat(2)`) is necessary to obtain
> -`file_length`. When reading the footer, readers must verify:
> +Readers must first read the file start to determine the version
> +number. Then they seek to `file_length - FOOTER_LENGTH` to access the
> +footer. A trusted external source (such as `stat(2)`) is necessary to
> +obtain `file_length`. When reading the footer, readers must verify:

In any case, the size of this patch is pleasant to see---it must be
a sign that the previous step was done well not to hardcode the
"hash size is 20 bytes" assumption all over the place and instead
used "this field holds N+m bytes where N is the size of the hash
described in the REFT header" consistently.

Nicely done.