Re: Starting to think about sha-256?

Linus Torvalds <torvalds@xxxxxxxx> · Sun, 27 Aug 2006 15:35:20 -0700 (PDT)

On Mon, 28 Aug 2006, Johannes Schindelin wrote:
> > 
> > Modifying git-convert-objects.c to rewrite the regular sha1 into a sha256 
> > should be fairly straightforward. It's never been used since the early 
> > days (and has limits like a maximum of a million objects etc that can need 
> > fixing), but it shouldn't be "fundamentally hard" per se.
> 
> But what about signed tags? (This issue has come up before, but never has 
> been adressed.)

Signed tags fundamentally have to be re-signed. That's by design: if 
somebody could rewrite an archive and signed tags would still be accepted 
to have the right signature, that would be a _serious_ sign of a totally 
broken security model.

The git security model isn't broken.

> I also thought about supporting hybrid hashes, i.e. that older objects 
> still can be hashed with SHA-1. Alas, a simple thought experiment 
> demonstrates how silly that idea is: most of the objects will not change 
> between two revisions, and they'd have to be rehashed with SHA-256 (or 
> whatever we decide upon) anyway, so hybrids would do no good.

Indeed. Hybrids would not only do no good, but they would actually 
_actively_ hurt things, because they'd fundamentally break the notion that 
the hash being identical means that the object (blob, tree, subtree) is 
the same.

So allowing two names for the same object is very fundamentally wrong in 
git-speak. 

> A better idea would be to increment the repository version, and expect 
> SHA-1 for version 1, SHA-256 for version >= 2.

Yes. It would be reasonably painful for users, though (as Krzysztof 
correctly points out). Every client would have to convert when a 
repository they track is converted.

> Even if the breakthrough really comes to full SHA-1, you still have to add 
> _at least_ 20 bytes of gibberish. Which would be harder to spot, but it 
> would be spotted.

Yeah, I don't think this is at all critical, especially since git really 
on a security level doesn't _depend_ on the hashes being cryptographically 
secure. As I explained early on (ie over a year ago, back when the whole 
design of git was being discussed), the _security_ of git actually depends 
on not cryptographic hashes, but simply on everybody being able to secure 
their own _private_ repository.

So the only thing git really _requires_ is a hash that is _unique_ for the 
developer (and there we are talking not of an _attacker_, but a benign 
participant).

That said, the cryptographic security of SHA-1 is obviously a real bonus. 
So I'd be disappointed if SHA-1 can be broken more easily (and I obviously 
already argued against using MD5, exactly because generating duplicates of 
that is fairly easy). But it's not "fundamentally required" in git per se.

[ The one exception: the "signed tags" security does depend on the hashes 
  being cryptographically strong. So again, breaking SHA-1 would not mean 
  that git stops working, but it _would_ potentially mean that if you 
  don't trust your own _private_ repository, the signed tag may no longer 
  protect you entirely ]

> This made me think about the use of hashes in git. Why do we need a hash 
> here (in no particular order):
> 
> 1) integrity checking,
> 2) fast lookup,
> 3) identifying objects (related to (2)),
> 4) trust.
> 
> Except for (4), I do not see why SHA-1 -- even if broken -- should not be 
> adequate. It is not like somebody found out that all JPGs tend to have 
> similar hashes so that collisions are more likely.

Correct. I'm pretty sure we had exactly this discussion around May 2005, 
but I'm too lazy to search ;)

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html