On Mon, 28 Aug 2006, Johannes Schindelin wrote: > > > > Modifying git-convert-objects.c to rewrite the regular sha1 into a sha256 > > should be fairly straightforward. It's never been used since the early > > days (and has limits like a maximum of a million objects etc that can need > > fixing), but it shouldn't be "fundamentally hard" per se. > > But what about signed tags? (This issue has come up before, but never has > been adressed.) Signed tags fundamentally have to be re-signed. That's by design: if somebody could rewrite an archive and signed tags would still be accepted to have the right signature, that would be a _serious_ sign of a totally broken security model. The git security model isn't broken. > I also thought about supporting hybrid hashes, i.e. that older objects > still can be hashed with SHA-1. Alas, a simple thought experiment > demonstrates how silly that idea is: most of the objects will not change > between two revisions, and they'd have to be rehashed with SHA-256 (or > whatever we decide upon) anyway, so hybrids would do no good. Indeed. Hybrids would not only do no good, but they would actually _actively_ hurt things, because they'd fundamentally break the notion that the hash being identical means that the object (blob, tree, subtree) is the same. So allowing two names for the same object is very fundamentally wrong in git-speak. > A better idea would be to increment the repository version, and expect > SHA-1 for version 1, SHA-256 for version >= 2. Yes. It would be reasonably painful for users, though (as Krzysztof correctly points out). Every client would have to convert when a repository they track is converted. > Even if the breakthrough really comes to full SHA-1, you still have to add > _at least_ 20 bytes of gibberish. Which would be harder to spot, but it > would be spotted. Yeah, I don't think this is at all critical, especially since git really on a security level doesn't _depend_ on the hashes being cryptographically secure. As I explained early on (ie over a year ago, back when the whole design of git was being discussed), the _security_ of git actually depends on not cryptographic hashes, but simply on everybody being able to secure their own _private_ repository. So the only thing git really _requires_ is a hash that is _unique_ for the developer (and there we are talking not of an _attacker_, but a benign participant). That said, the cryptographic security of SHA-1 is obviously a real bonus. So I'd be disappointed if SHA-1 can be broken more easily (and I obviously already argued against using MD5, exactly because generating duplicates of that is fairly easy). But it's not "fundamentally required" in git per se. [ The one exception: the "signed tags" security does depend on the hashes being cryptographically strong. So again, breaking SHA-1 would not mean that git stops working, but it _would_ potentially mean that if you don't trust your own _private_ repository, the signed tag may no longer protect you entirely ] > This made me think about the use of hashes in git. Why do we need a hash > here (in no particular order): > > 1) integrity checking, > 2) fast lookup, > 3) identifying objects (related to (2)), > 4) trust. > > Except for (4), I do not see why SHA-1 -- even if broken -- should not be > adequate. It is not like somebody found out that all JPGs tend to have > similar hashes so that collisions are more likely. Correct. I'm pretty sure we had exactly this discussion around May 2005, but I'm too lazy to search ;) Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html