On Fri, Aug 03 2018, Jonathan Nieder wrote: > Hi again, > > Sorry for the slow review. I finally got a chance to look this over > again. > > My main nits are about the commit message: I think it still focuses > too much on the process instead of the usual "knowing what I know now, > here's the clearest explanation for why we need this patch" approach. > I can send a patch illustrating what I mean tomorrow morning. I think it makes if you just take over 2/2 of this series (or even the whole thing), since the meat of it is already something I copy/pasted from you, and you've got more of an opinion / idea about how to proceed (which is good!); it's more efficient than me trying to fix various stuff you're pointing out at this point, I also think it makes sense that you change the "Author" line for that, since the rest of the changes will mainly be search-replace by me. Perhaps it's better for readability if those search-replace changes go into their own change, i.e. make it a three-part where 2/3 does the search-replace, and promises that 3/3 has the full rationale etc. > Ævar Arnfjörð Bjarmason wrote: > >> From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, >> K12, and so on are all believed to have similar security properties. >> All are good options from a security point of view. >> >> SHA-256 has a number of advantages: >> >> * It has been around for a while, is widely used, and is supported by >> just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, >> SecureTransport, etc). >> >> * When you compare against SHA1DC, most vectorized SHA-256 >> implementations are indeed faster, even without acceleration. >> >> * If we're doing signatures with OpenPGP (or even, I suppose, CMS), >> we're going to be using SHA-2, so it doesn't make sense to have our >> security depend on two separate algorithms when either one of them >> alone could break the security when we could just depend on one. >> >> So SHA-256 it is. > > The above is what I wrote, so of course I'd like it. ;-) > >> See the "Hash algorithm analysis" thread as of >> [1]. Linus has come around to this choice and suggested Junio make the >> final pick, and he's endorsed SHA-256 [3]. > > The above paragraph has the same problem as before of (1) not being > self-contained and (2) focusing on the process that led to this patch > instead of the benefit of the patch itself. I think we should omit it. > >> This follow-up change changes occurrences of "NewHash" to >> "SHA-256" (or "sha256", depending on the context). The "Selection of a >> New Hash" section has also been changed to note that historically we >> used the the "NewHash" name while we didn't know what the new hash >> function would be. > > nit: Commit messages are usually in the imperative tense. This is in > the past tense, I think because it is a continuation of that > discussion about process. > > For this part, I think we can let the patch speak for itself. > >> This leaves no use of "NewHash" anywhere in git.git except in the >> aforementioned section (and as a variable name in t/t9700/test.pl, but >> that use from 2008 has nothing to do with this transition plan). > > This part is helpful --- good. > >> 1. https://public-inbox.org/git/20180720215220.GB18502@xxxxxxxxxxxxxxxxxxxxxxxxxx/ >> 2. https://public-inbox.org/git/CA+55aFwSe9BF8e0hLk9pp3FVD5LaVY5GRdsV3fbNtgzekJadyA@xxxxxxxxxxxxxx/ >> 3. https://public-inbox.org/git/xmqqzhygwd5o.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/ > > Footnotes to the historical part --- I'd recommend removing these. > >> Helped-by: Jonathan Nieder <jrnieder@xxxxxxxxx> >> Helped-by: Junio C Hamano <gitster@xxxxxxxxx> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> > > Here I'd want to put a pile of acks, e.g.: > > Acked-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Acked-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> > Acked-by: Johannes Schindelin <Johannes.Schindelin@xxxxxx> > Acked-by: Dan Shumow <danshu@xxxxxxxxxxxxx> > Acked-by: Junio C Hamano <gitster@xxxxxxxxx> > > [...] >> --- a/Documentation/technical/hash-function-transition.txt >> +++ b/Documentation/technical/hash-function-transition.txt >> @@ -59,14 +59,11 @@ that are believed to be cryptographically secure. >> >> Goals >> ----- >> -Where NewHash is a strong 256-bit hash function to replace SHA-1 (see >> -"Selection of a New Hash", below): >> - >> -1. The transition to NewHash can be done one local repository at a time. >> +1. The transition to SHA-256 can be done one local repository at a time. > > Yay! > > [...] >> [extensions] >> - objectFormat = newhash >> + objectFormat = sha256 >> compatObjectFormat = sha1 > > Yes, makes sense. > > [...] >> @@ -155,36 +152,36 @@ repository extensions. >> Object names >> ~~~~~~~~~~~~ >> Objects can be named by their 40 hexadecimal digit sha1-name or 64 >> -hexadecimal digit newhash-name, plus names derived from those (see >> +hexadecimal digit sha256-name, plus names derived from those (see >> gitrevisions(7)). >> >> The sha1-name of an object is the SHA-1 of the concatenation of its >> type, length, a nul byte, and the object's sha1-content. This is the >> traditional <sha1> used in Git to name objects. >> >> -The newhash-name of an object is the NewHash of the concatenation of its >> -type, length, a nul byte, and the object's newhash-content. >> +The sha256-name of an object is the SHA-256 of the concatenation of its >> +type, length, a nul byte, and the object's sha256-content. > > Sensible. > > [...] >> >> Object format >> ~~~~~~~~~~~~~ >> The content as a byte sequence of a tag, commit, or tree object named >> -by sha1 and newhash differ because an object named by newhash-name refers to >> +by sha1 and sha256 differ because an object named by sha256-name refers to > > Not about this patch: this should say SHA-1 and SHA-256, I think. > Leaving it as is in this patch as you did is the right thing. > > [...] >> @@ -255,10 +252,10 @@ network byte order): >> up to and not including the table of CRC32 values. >> - Zero or more NUL bytes. >> - The trailer consists of the following: >> - - A copy of the 20-byte NewHash checksum at the end of the >> + - A copy of the 20-byte SHA-256 checksum at the end of the > > Not about this patch: a SHA-256 is 32 bytes. Leaving that for a > separate patch as you did is the right thing, though. > > [...] >> - - 20-byte NewHash checksum of all of the above. >> + - 20-byte SHA-256 checksum of all of the above. > > Likewise. > > [...] >> @@ -351,12 +348,12 @@ the following steps: >> (This list only contains objects reachable from the "wants". If the >> pack from the server contained additional extraneous objects, then >> they will be discarded.) >> -3. convert to newhash: open a new (newhash) packfile. Read the topologically >> +3. convert to sha256: open a new (sha256) packfile. Read the topologically > > Not about this patch: this one's more murky, since it's talking about > the object names instead of the hash function. I guess "sha256" > instead of "SHA-256" for this could be right, but I worry it's going > to take time for me to figure out the exact distinction. That seems > like a reason to just call it SHA-256 (but in a separate patch). > > [...] >> - sha1-content, convert to newhash-content, and write it to the newhash >> - pack. Record the new sha1<->newhash mapping entry for use in the idx. >> + sha1-content, convert to sha256-content, and write it to the sha256 >> + pack. Record the new sha1<->sha256 mapping entry for use in the idx. >> 4. sort: reorder entries in the new pack to match the order of objects >> - in the pack the server generated and include blobs. Write a newhash idx >> + in the pack the server generated and include blobs. Write a sha256 idx >> file > > Likewise. > > [...] >> @@ -388,16 +385,16 @@ send-pack. >> >> Signed Commits >> ~~~~~~~~~~~~~~ >> -We add a new field "gpgsig-newhash" to the commit object format to allow >> +We add a new field "gpgsig-sha256" to the commit object format to allow >> signing commits without relying on SHA-1. It is similar to the >> -existing "gpgsig" field. Its signed payload is the newhash-content of the >> -commit object with any "gpgsig" and "gpgsig-newhash" fields removed. >> +existing "gpgsig" field. Its signed payload is the sha256-content of the >> +commit object with any "gpgsig" and "gpgsig-sha256" fields removed. > > That reminds me --- we need to add support for stripping these out. > > [...] >> @@ -601,18 +598,22 @@ The user can also explicitly specify which format to use for a >> particular revision specifier and for output, overriding the mode. For >> example: >> >> -git --output-format=sha1 log abac87a^{sha1}..f787cac^{newhash} >> +git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256} >> >> -Selection of a New Hash >> ------------------------ >> +Choice of Hash >> +-------------- > > Yay! > > [...] >> -Some hashes under consideration are SHA-256, SHA-512/256, SHA-256x16, >> -K12, and BLAKE2bp-256. >> +We choose SHA-256. See the thread starting at >> +<20180609224913.GC38834@xxxxxxxxxxxxxxxxxxxxxxxxxx> for the discussion >> +(https://public-inbox.org/git/20180609224913.GC38834@xxxxxxxxxxxxxxxxxxxxxxxxxx/) > > Can this reference be moved to a footnote? It's not part of the > design, but it's a good reference. > > Thanks again for getting this documented. > > Sincerely, > Jonathan