Hi again, Sorry for the slow review. I finally got a chance to look this over again. My main nits are about the commit message: I think it still focuses too much on the process instead of the usual "knowing what I know now, here's the clearest explanation for why we need this patch" approach. I can send a patch illustrating what I mean tomorrow morning. Ævar Arnfjörð Bjarmason wrote: > From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, > K12, and so on are all believed to have similar security properties. > All are good options from a security point of view. > > SHA-256 has a number of advantages: > > * It has been around for a while, is widely used, and is supported by > just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, > SecureTransport, etc). > > * When you compare against SHA1DC, most vectorized SHA-256 > implementations are indeed faster, even without acceleration. > > * If we're doing signatures with OpenPGP (or even, I suppose, CMS), > we're going to be using SHA-2, so it doesn't make sense to have our > security depend on two separate algorithms when either one of them > alone could break the security when we could just depend on one. > > So SHA-256 it is. The above is what I wrote, so of course I'd like it. ;-) > See the "Hash algorithm analysis" thread as of > [1]. Linus has come around to this choice and suggested Junio make the > final pick, and he's endorsed SHA-256 [3]. The above paragraph has the same problem as before of (1) not being self-contained and (2) focusing on the process that led to this patch instead of the benefit of the patch itself. I think we should omit it. > This follow-up change changes occurrences of "NewHash" to > "SHA-256" (or "sha256", depending on the context). The "Selection of a > New Hash" section has also been changed to note that historically we > used the the "NewHash" name while we didn't know what the new hash > function would be. nit: Commit messages are usually in the imperative tense. This is in the past tense, I think because it is a continuation of that discussion about process. For this part, I think we can let the patch speak for itself. > This leaves no use of "NewHash" anywhere in git.git except in the > aforementioned section (and as a variable name in t/t9700/test.pl, but > that use from 2008 has nothing to do with this transition plan). This part is helpful --- good. > 1. https://public-inbox.org/git/20180720215220.GB18502@xxxxxxxxxxxxxxxxxxxxxxxxxx/ > 2. https://public-inbox.org/git/CA+55aFwSe9BF8e0hLk9pp3FVD5LaVY5GRdsV3fbNtgzekJadyA@xxxxxxxxxxxxxx/ > 3. https://public-inbox.org/git/xmqqzhygwd5o.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/ Footnotes to the historical part --- I'd recommend removing these. > Helped-by: Jonathan Nieder <jrnieder@xxxxxxxxx> > Helped-by: Junio C Hamano <gitster@xxxxxxxxx> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> Here I'd want to put a pile of acks, e.g.: Acked-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Acked-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> Acked-by: Johannes Schindelin <Johannes.Schindelin@xxxxxx> Acked-by: Dan Shumow <danshu@xxxxxxxxxxxxx> Acked-by: Junio C Hamano <gitster@xxxxxxxxx> [...] > --- a/Documentation/technical/hash-function-transition.txt > +++ b/Documentation/technical/hash-function-transition.txt > @@ -59,14 +59,11 @@ that are believed to be cryptographically secure. > > Goals > ----- > -Where NewHash is a strong 256-bit hash function to replace SHA-1 (see > -"Selection of a New Hash", below): > - > -1. The transition to NewHash can be done one local repository at a time. > +1. The transition to SHA-256 can be done one local repository at a time. Yay! [...] > [extensions] > - objectFormat = newhash > + objectFormat = sha256 > compatObjectFormat = sha1 Yes, makes sense. [...] > @@ -155,36 +152,36 @@ repository extensions. > Object names > ~~~~~~~~~~~~ > Objects can be named by their 40 hexadecimal digit sha1-name or 64 > -hexadecimal digit newhash-name, plus names derived from those (see > +hexadecimal digit sha256-name, plus names derived from those (see > gitrevisions(7)). > > The sha1-name of an object is the SHA-1 of the concatenation of its > type, length, a nul byte, and the object's sha1-content. This is the > traditional <sha1> used in Git to name objects. > > -The newhash-name of an object is the NewHash of the concatenation of its > -type, length, a nul byte, and the object's newhash-content. > +The sha256-name of an object is the SHA-256 of the concatenation of its > +type, length, a nul byte, and the object's sha256-content. Sensible. [...] > > Object format > ~~~~~~~~~~~~~ > The content as a byte sequence of a tag, commit, or tree object named > -by sha1 and newhash differ because an object named by newhash-name refers to > +by sha1 and sha256 differ because an object named by sha256-name refers to Not about this patch: this should say SHA-1 and SHA-256, I think. Leaving it as is in this patch as you did is the right thing. [...] > @@ -255,10 +252,10 @@ network byte order): > up to and not including the table of CRC32 values. > - Zero or more NUL bytes. > - The trailer consists of the following: > - - A copy of the 20-byte NewHash checksum at the end of the > + - A copy of the 20-byte SHA-256 checksum at the end of the Not about this patch: a SHA-256 is 32 bytes. Leaving that for a separate patch as you did is the right thing, though. [...] > - - 20-byte NewHash checksum of all of the above. > + - 20-byte SHA-256 checksum of all of the above. Likewise. [...] > @@ -351,12 +348,12 @@ the following steps: > (This list only contains objects reachable from the "wants". If the > pack from the server contained additional extraneous objects, then > they will be discarded.) > -3. convert to newhash: open a new (newhash) packfile. Read the topologically > +3. convert to sha256: open a new (sha256) packfile. Read the topologically Not about this patch: this one's more murky, since it's talking about the object names instead of the hash function. I guess "sha256" instead of "SHA-256" for this could be right, but I worry it's going to take time for me to figure out the exact distinction. That seems like a reason to just call it SHA-256 (but in a separate patch). [...] > - sha1-content, convert to newhash-content, and write it to the newhash > - pack. Record the new sha1<->newhash mapping entry for use in the idx. > + sha1-content, convert to sha256-content, and write it to the sha256 > + pack. Record the new sha1<->sha256 mapping entry for use in the idx. > 4. sort: reorder entries in the new pack to match the order of objects > - in the pack the server generated and include blobs. Write a newhash idx > + in the pack the server generated and include blobs. Write a sha256 idx > file Likewise. [...] > @@ -388,16 +385,16 @@ send-pack. > > Signed Commits > ~~~~~~~~~~~~~~ > -We add a new field "gpgsig-newhash" to the commit object format to allow > +We add a new field "gpgsig-sha256" to the commit object format to allow > signing commits without relying on SHA-1. It is similar to the > -existing "gpgsig" field. Its signed payload is the newhash-content of the > -commit object with any "gpgsig" and "gpgsig-newhash" fields removed. > +existing "gpgsig" field. Its signed payload is the sha256-content of the > +commit object with any "gpgsig" and "gpgsig-sha256" fields removed. That reminds me --- we need to add support for stripping these out. [...] > @@ -601,18 +598,22 @@ The user can also explicitly specify which format to use for a > particular revision specifier and for output, overriding the mode. For > example: > > -git --output-format=sha1 log abac87a^{sha1}..f787cac^{newhash} > +git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256} > > -Selection of a New Hash > ------------------------ > +Choice of Hash > +-------------- Yay! [...] > -Some hashes under consideration are SHA-256, SHA-512/256, SHA-256x16, > -K12, and BLAKE2bp-256. > +We choose SHA-256. See the thread starting at > +<20180609224913.GC38834@xxxxxxxxxxxxxxxxxxxxxxxxxx> for the discussion > +(https://public-inbox.org/git/20180609224913.GC38834@xxxxxxxxxxxxxxxxxxxxxxxxxx/) Can this reference be moved to a footnote? It's not part of the design, but it's a good reference. Thanks again for getting this documented. Sincerely, Jonathan