Re: Transition plan for git to move to a new hash function

Ian Jackson <ijackson@xxxxxxxxxxxxxxxxxxxxxx> · Sun, 5 Mar 2017 13:45:46 +0000

brian m. carlson writes ("Re: Transition plan for git to move to a new hash function"):
> Instead, I was referring to areas like the notes code.  It has extensive
> use of the last byte as a type of lookup table key.  It's very dependent
> on having exactly one hash, since it will always want to use the last
> byte.

You mean note_tree_search ?  (My tree here may be a bit out of date.)
This doesn't seem difficult to fix.  The nontrivial changes would be
mostly confined to SUBTREE_SHA1_PREFIXCMP and GET_NIBBLE.

It's true that like most of git there's a lot of hardcoded `sha1'.

Are you arguing in favour of "replace git with git2 by simply
s/20/64/g; s/sha1/blake/g" ?  This seems to me to be a poor idea.
Takeup of the new `git2' would be very slow because of the pain
involved.

Any sensible method of moving to a new hash that isn't "make a
completely incompatible new version of git" is going to involve
teaching the code we have in git right now to handle new hashes as
well as sha1 hashes.

Even if the plan is to try to convert old data, rather than keep it
and be able to refer to it from new data, something will have to be
able to parse old packfiles, old commits, old tags, old notes,
etc. etc. etc.  Either that's going to be some separate conversion
utility, or it has to be the same code in git that's there already.[1]

The ability to handle both old-format and new-format data can be
achieved in the code by doing away with the hardcoded sha1s, so that
instead the hash is an abstract data type with operations like
"initialise", "compare", "get a nybble", etc.  We've already seen
patches going in this direction.

[1] I've heard suggestions here that instead we should expect users to
"git1 fast-export", which you would presumably feed into "git2
fast-import".  But what is `git1' here ?  Is it the current git
codebase frozen in time ?  I don't think it can be.  With this
conversion strategy, we will need to maintain git1 for decades.  It
will need portability fixes, security fixes, fixes for new hostile
compiler optimisations, and so on.  The difficulty of conversion means
there will be pressure to backport new features from `git2' to `git1'.
(Also this approach means that all signatures are definitively lost
during the conversion process.)

So if we want to provide both `git1' and `git2', it's still better to
compile `git' and `git2' from the same codebase.  But if we do that,
the resulting ifdeffery and/or other hash abstractions are most of the
work to be hash-agile.  It's just the difference between a
compile-time and runtime switch.

I think the incompatibile approach is much more work in the medium and
long term - and it leads to a longer transition period.

Bear in mind that our objective is not to minimise the time until the
new version of git is available.  Our objective is to minimise the
time until (most) people are using it.  An approach which takes longer
for the git community to develop, but which is easier to deploy, can
easily be better.

Or maybe the objective is to minimise overall effort.  In which case
more work on git, for an easier transition for all the users, seems
like a no-brainer.  I think this is arguably true even from the point
of view of effort amongst the community of git contributors.  git
contributors start out as git users - and if git's users are all busy
struggling with a difficult transition, they will have less time to
improve other stuff and will tend less to get involved upstream.  (And
they may be less inclined to feel that the git upstream developers
understand their needs well.)

The better alternative is to adopt a plan that has a clear and
straightforward transition for users, and ask git users to help with
implementation.

I think many git users, including sophisticated users and competent
organisations, are concerned about sha1.  Currently most of those
users will find it difficult to help, because it's not clear to them
what needs to be done.

Thanks,
Ian.