Re: [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-09-08 at 23:10:18, Eric W. Biederman wrote:
> The v3 pack index file as documented has a lot of complexity making it
> difficult to implement correctly.  I worked with bryan's preliminary
> implementation and it took several passes to get the bugs out.
> 
> The complexity also requires multiple table look-ups to find all of
> the information that is needed to translate from one kind of oid to
> another.  Which can't be good for cache locality.
> 
> Even worse coming up with a new index file version requires making
> changes that have the potentialy to break anything that uses the index
> of a pack file.
> 
> Instead of continuing to deal with the chance of braking things
> besides the oid mapping functionality, the additional complexity in
> the file format, and worry if the performance would be reasonable I
> stripped down the problem to it's fundamental complexity and came up
> with a file format that is exactly about mapping one kind of oid to
> another, and only supports two kinds of oids.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> ---
>  .../technical/hash-function-transition.txt    | 40 +++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
> index ed574810891c..4b937480848a 100644
> --- a/Documentation/technical/hash-function-transition.txt
> +++ b/Documentation/technical/hash-function-transition.txt
> @@ -209,6 +209,46 @@ format described in linkgit:gitformat-pack[5], just like
>  today. The content that is compressed and stored uses SHA-256 content
>  instead of SHA-1 content.
>  
> +Per Pack Mapping Table
> +~~~~~~~~~~~~~~~~~~~~~~
> +A pack compat map file (.compat) files have the following format:
> +
> +HEADER:
> +	4-byte signature:
> +	    The signature is: {'C', 'M', 'A', 'P'}
> +	1-byte version number:
> +	    Git only writes or recognizes version 1.
> +	1-byte First Object Id Version
> +	    We infer the length of object IDs (OIDs) from this value:
> +		1 => SHA-1
> +		2 => SHA-256

One thing I forgot to mention here, is that we have 32-bit format IDs
for these in the structure, so we should use them here and below.  These
are GIT_SHA1_FORMAT_ID and GIT_SHA256_FORMAT_ID.

Not that I would encourage distributing such software, but it makes it
much easier for people to experiment with additional hash algorithms (in
terms of performance, etc.) if we make the space a little sparser.

> +	1-byte Second Object Id Version
> +	    We infer the length of object IDs (OIDs) from this value:
> +		1 => SHA-1
> +		2 => SHA-256

In your new patch for the next part, you consider that there might be
multiple compatibility hash algorithms.  I had anticipated only one at
a time in my series, but I'm not opposed to multiple if you want to
support that.

However, here you're making the assumption that there are only two.  If
you want to support multiple values, we need to explicitly consider that
both here (where we need a count of object ID version and multiple
tables, one for each algorithm), and in the follow-up series.

I had not considered more than two algorithms because it substantially
complicates the code and requires us to develop n*(n-1) tables, but I'm
not the one volunteering to do most of the work here, so I'll defer to
your preference.  (I do intend to send a patch or two, though.)

It's also possible we could be somewhat provident and define the on-disk
formats for multiple algorithms and then punt on the code until later if
you prefer that.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux