(I'm not sure why you dropped git@vger. I see nothing private here so I bring git@vger back) On Fri, Dec 2, 2011 at 3:08 PM, Bill Zaumen <bill.zaumen@xxxxxxxxx> wrote: > At one point Nguyen said that "What I'm thinking is whether it's > possible to decouple two sha-1 roles in git, as object identifier > and digest, separately. Each sha-1 identifies an object and an extra > set of digests on the "same" object." > > My code pretty much does that (it just uses a CRC instead of a real > digest, but I can easily change that). It'd be easier to look at your code if you split it into a series of smaller patches. > So the question is whether > using SHA-1 as an ID and SHA-256(?) as a digest is a better long term > solution than simply replacing SHA-1. I would not stick with any algorithm permanently. No one knows when SHA-256 might be broken. > If there is some interest in pursuing it further, I could make those > changes fairly easily. Then you'd have two message digests, a SHA-1 > and a longer one, with the longer one stored parallel to the actual > object. Then it becomes easy to compute a digest of all the digests > in a commit's tree and store that in a commit, if that is what you > want to do. I personally would like to see how it works out especially when computing new digests is much more expensive than SHA-1. And I hope that by delaying computing new digests (stored outside actual objects), we could make minimum code changes to git. Though security concerns may be the killer factor and I haven't worked that out yet. > Replacing SHA-1 with something like SHA-256 sounds easier to implement, SHA-1 charateristics (like 20 byte length) are hard coded everywhere in git, it'd be a big audit. > but the problem is all the existing repositories. While rewriting all > the objects and trees to use new hashes is similar to a rebase in most > cases, there is a complication - submodules. Git stores the hash of > a submodule's commit in its tree because a particular revision of > a project 'goes' with a particular revision of a submodule. But, a > submodule can exist in one revision and not in the next or previous > revision Furthermore A could be a submodule of B at one point in time, > and many commits later, B could end up being a submodule of A. > Fixing it up could be pretty complicated (plus having to deal with > network failures - to update GitHub for example, you'd have to download > submodules it uses, possibly from somewhere else and some submodules may > not be publicly accessible (e.g., a private project kept on GitHub but > with a critical submodule kept in house behind a corporate firewall). > Also, you might have to update a git repository and its submodules > concurrently, so that you always can find a new value when you need > it. > > My guess is that this could be far more complicated than what I did. > Excluding two files that are not used (the symbol PACKDB is not > defined), I added two new files, crcdb.h and objd-crcdb.c which store > CRCs for loose objects - 517 lines total including lots of comments in > the header file - full documentation for each function. The other > changes include 1475 lines of new code in previously existing git files > and 136 deletions (most trivial). There were also minor changes to > the makefile and test scripts. You'd need to convince git maintainer this is worth doing first, before talking how big the changes are ;-) > Bill -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html