Re: [PATCH v4 4/5] Add reftable library

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 11 Feb 2020 08:31:55 -0800

Han-Wen Nienhuys <hanwen@xxxxxxxxxx> writes:

> I've uploaded https://git.eclipse.org/r/c/157501/ that proposes a way
> to encode the hash size. I  look forward to feedback.

Development and design discussion happens here.  Please do not
require people to go click at an external website---once people
start doing so, we'd have to go around 47 different places to piece
one discussion together, which is just crazy.

Here is what I saw:

    A 24-byte header appears at the beginning of the file:

        'REFT'
        uint8( format_version )
        uint24( block_size )
        uint64( min_update_index )
        uint64( max_update_index )

    The `format_version` is a byte, and it indicates both the version of the on-disk
    format, as well as the size of the hash. The hash size is indicated in the MSB
    of the `format_version`. For the SHA1 hash, `format_version & 0x80 == 0` and all
    hash values are 20 bytes. For SHA256, `format_version & 0x80 == 1`, and all hash
    values are 32 bytes. Future hash functions may be added by using more bits at
    the right.

    The file format version can be extract as `format_version & 0x7f`. Currently,
    only version 1 is defined.

If you cast in stone that "& 0x7f is the way to extract the
version", then you cannot promise that you may steal more bits at
the right of MSB to support more hash functions, as you've reserved
the rightmost 7 bits already for the version number with 0x7f and
there are only 8 bits in your byte.

It seems that you are trying to make the format too dense?  Is it
too much a waste to use a separate word or a byte for hash?  Or
perhaps declare that format version 1 uses SHA-1, format version 2
uses SHA-256, etc. (in other words, do we want to support both SHA-1
and SHA-256 when we are at format version 7)?

Thanks.