Re: Transition plan for git to move to a new hash function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 05, 2017 at 01:45:46PM +0000, Ian Jackson wrote:
> brian m. carlson writes ("Re: Transition plan for git to move to a new hash function"):
> > Instead, I was referring to areas like the notes code.  It has extensive
> > use of the last byte as a type of lookup table key.  It's very dependent
> > on having exactly one hash, since it will always want to use the last
> > byte.
> 
> You mean note_tree_search ?  (My tree here may be a bit out of date.)
> This doesn't seem difficult to fix.  The nontrivial changes would be
> mostly confined to SUBTREE_SHA1_PREFIXCMP and GET_NIBBLE.
> 
> It's true that like most of git there's a lot of hardcoded `sha1'.

I'm talking about the entire notes.c file.  There are several different
uses of "19" in there, and they compose at least two separate concepts.
My object-id-part9 series tries to split those out into logical
constants.

This code is not going to handle repositories with different-length
objects well, which I believe was your initial proposal.  I originally
thought that mixed-hash repositories would be viable as well, but I no
longer do.

> Are you arguing in favour of "replace git with git2 by simply
> s/20/64/g; s/sha1/blake/g" ?  This seems to me to be a poor idea.
> Takeup of the new `git2' would be very slow because of the pain
> involved.

I'm arguing that the same binary ought to be able to handle both SHA-1
and the new hash.  I'm also arguing that a given object have exactly one
hash and that we not mix hashes in the same object.  A repository will
be composed of one type of object, and if that's the new hash, a lookup
table will be used to translate SHA-1.  We can synthesize the old
objects, should we need them.

That allows people to use the SHA-1 hashes (in my view, with a prefix,
such as "sha1:") in repositories using the new hash.  It also allows
verifying old tags and commits if need be.

What I *would* like to see is an extension to the tag and commit objects
which names the hash that was used to make them.  That makes it easy to
determine which object the signature should be verified over, as it will
verify over only one of them.

> [1] I've heard suggestions here that instead we should expect users to
> "git1 fast-export", which you would presumably feed into "git2
> fast-import".  But what is `git1' here ?  Is it the current git
> codebase frozen in time ?  I don't think it can be.  With this
> conversion strategy, we will need to maintain git1 for decades.  It
> will need portability fixes, security fixes, fixes for new hostile
> compiler optimisations, and so on.  The difficulty of conversion means
> there will be pressure to backport new features from `git2' to `git1'.
> (Also this approach means that all signatures are definitively lost
> during the conversion process.)

I'm proposing we have a git hash-convert (the name doesn't matter that
much) that converts in place.  It rebuilds the objects and builds a
lookup table.  Since the contents of git objects are deterministic, this
makes it possible for each individual user to make the transition in
place.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]