Re: Migrating away from SHA-1?

"H. Peter Anvin" <hpa@xxxxxxxxx> · Tue, 12 Apr 2016 18:38:25 -0700

On 04/12/16 18:03, Junio C Hamano wrote:

and so on. Of course trees don't have any space for this; they have a
fixed-length for the hash part of each record, which is basically:

   <mode> <name> NUL <20-byte-sha1>

So we'd probably need a "treev2" object type that gives room for an
algorithm byte (or we'd have to try to shove it into the mode, but since
old versions won't know the new algorithm anyway, I don't think it
solves that much...). Or you can just define for the whole tree object
(either implicit in its type, or in a header) that it always uses
algorithm X.

This will hurt the performance a lot during the transition period as
it no longer will be possible to rely on "most of the time a fine
grained commit changes only a small part of the tree, and we can
cheaply avoid descending into trees that haven't changed because we
can tell that the corresponding tree objects in the pre- and post-
trees have the same object name" optimization.  But we cannot avoid
it.

Not really, because you can point to the algoX hash even for the 
existing objects.

Perhaps the tree object can add a format descriptor at the beginning; 
something like:

<invalid mode number> <hash format used>

Transitioning to that would be something like:

   0. Overhaul all of the git code to handle arbitrary-sized object ids.

   1. Decide on the new algorithm and implement it in git.

   2. Recognize parameterized object ids in commits and tags (designing
      format, implementing the reading side).

   3. Recognize parameterized object ids somehow in trees (designing
      format, implementing the reading side).

   4. Teach the object database to index objects by the new algorithm (or
      possibly both algorithms).

   5. Add a protocol extension so that both sides can decide which
      algorithm is being used when they talk about oids.

   6. Add a config option to write references in objects using the new
      algorithm.

   7. After a while, flip the config option on. Hopefully the readers
      from steps 1-5 have percolated to the masses by then, and it's not
      a horrible flag day.

We're basically on step 0 right now. I'm sure I'm missing some
subtleties in there, too.

One subtlety is that 7. "not a flag day" may not be a good thing.

There has to be a section of a history that spans the transition,
set of commits and trees that have pointers to both kinds of object
names.  The narrower such a section of the history, the more
pleasant to use the result of the transition would be.

Different projects that can have their own flag days at their own
pace is a good thing, so the above observation does not invalidate
your transition plan, though.

I don't think there is any way this can *not* be by repository and 
somehow require a manual operation in order to preserve the 
cryptographic integrity.  In some ways, the transition point and the 
transition table becomes a special kind of tag object.  There may have 
to be more than one in the case of commits in multiple trees.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html