Re: weaning distributions off tarballs: extended verification of git tags

Sam Vilain <sam@xxxxxxxxxx> · Mon, 02 Mar 2015 12:52:09 -0800

On 03/02/2015 12:08 PM, Junio C Hamano wrote:
I have a
hazy recollection of what it would take to replace SHA-1 in git with
something else; it should be possible (though tricky) to do it lazily,
where a tree entry has bits (eg, some of the currently unused file
mode bits) to denotes which hash algorithm is in use for the entry.
However I don't think that got past idea stage...
I think one reason why it didn't was because it would not work well.
That "bit that tells this is a new object or old" would mean that a
single tree can have many different object names, depending on which
of its component entries are using that bit and which aren't.  There
goes the "we know two trees with the same object name are identical
without recursing into them" optimization out the window.

Also it would make it impossible to do what you suggest to Joey to
do, i.e. "exactly the same way that git does", once you start saying
that a tree object can be encoded in more than one different ways,
wouldn't it?

I was reasoning that people would rather not have to rewrite their whole 
history in order to switch checksum algorithms, and that by allowing 
trees to be lazily converted that this would make things more 
efficient.  However, I think I see your point here that this doesn't work.

However, as a per-commit header, then only first commit which changes 
the hashing algorithm would have to re-checksum each of the files: but 
just in the current tree, not all the way back to the beginning of 
history.  The delta logic should not have to care, and these objects 
with the same content but different object ID should pack perfectly, so 
long as git-pack-objects knows to re-checksum objects with the available 
hash algorithms and spot matches.

Other operations like diff which span commit hashing algorithms might be 
able to get away with their existing object ranking algorithms and cache 
alternate object IDs for content as they operate to facilitate exact 
matching across hash algorithm changes.

But actually, for the original problem - just producing a signature with 
a different hashing algorithm - probably it would be sufficient to just 
re-hash the current commit and the current tree recursively, and the 
mixed hash-algorithm case does not need to exist.  But I'm just thinking 
it might not be too hard to make git nicely generic, to be well prepared 
for when a second pre-image attack on SHA-1 becomes practical.

Sam
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html