Re: SHA1 collisions found

Brandon Williams <bmwill@xxxxxxxxxx> · Thu, 2 Mar 2017 13:46:10 -0800

On 02/26, Jeff King wrote:
> On Sun, Feb 26, 2017 at 01:13:59AM +0000, Jason Cooper wrote:
> 
> > On Fri, Feb 24, 2017 at 10:10:01PM -0800, Junio C Hamano wrote:
> > > I was thinking we would need mixed mode support for smoother
> > > transition, but it now seems to me that the approach to stratify the
> > > history into old and new is workable.
> > 
> > As someone looking to deploy (and having previously deployed) git in
> > unconventional roles, I'd like to add one caveat.  The flag day in the
> > history is great, but I'd like to be able to confirm the integrity of
> > the old history.
> > 
> > "Counter-hashing" the blobs is easy enough, but the trees, commits and
> > tags would need to have, iiuc, some sort of cross-reference.  As in my
> > previous example, "git tag -v v3.16" also checks the counter hash to
> > further verify the integrity of the history (yes, it *really* needs to
> > check all of the old hashes, but I'd like to make sure I can do step one
> > first).
> > 
> > Would there be opposition to counter-hashing the old commits at the flag
> > day?
> 
> I don't think a counter-hash needs to be embedded into the git objects
> themselves. If the "modern" repo format stores everything primarily as
> sha-256, say, it will probably need to maintain a (local) mapping table
> of sha1/sha256 equivalence. That table can be generated at any time from
> the object data (though I suspect we'll keep it up to date as objects
> enter the repository).
> 
> At the flag day[1], you can make a signed tag with the "correct" mapping
> in the tag body (so part of the actual GPG signed data, not referenced
> by sha1). Then later you can compare that mapping to the object content
> in the repo (or to the local copy of the mapping based on that data).
> 
> -Peff
> 
> [1] You don't even need to wait until the flag day. You can do it now.
>     This is conceptually similar to the git-evtag tool, though it just
>     signs the blob contents of the tag's current tree state. Signing the
>     whole mapping lets you verify the entirety of history, but of course
>     that mapping is quite big: 20 + 32 bytes per object for
>     sha1/sha-256, which is ~250MB for the kernel. So you'd probably not
>     want to do it more than once.

There were a few of us discussing this sort of approach internally.  We
also figured that, given some performance hit, you could maintain your
repo in sha256 and do some translation to sha1 if you need to push or
fetch to a server which has the the repo in a sha1 format.  This way you
can convert your repo independently of the rest of the world.

As for storing the translation table, you should really only need to
maintain the table until old clients are phased out and all of the repos
of a project have experienced flag day and have been converted to
sha256.

-- 
Brandon Williams