On Wed, Aug 29 2018, brian m. carlson wrote: > SHA-1 is weak and we need to transition to a new hash function. For > some time, we have referred to this new function as NewHash. > > The selection criteria for NewHash specify that it should (a) be 256 > bits in length, (b) have high quality implementations available, (c) > should match Git's needs in terms of security, and (d) ideally, be fast > to compute. > > SHA-256 has a variety of high quality implementations across various > libraries. It is implemented by every cryptographic library we support > and is available on every platform and in almost every programming > language. It is often highly optimized, since it is commonly used in > TLS and elsewhere. Additionally, there are various command line > utilities that implement it, which is useful for educational and testing > purposes. > > SHA-256 is presently considered secure and has received a reasonable > amount of cryptanalysis in the literature. It is, admittedly, not > resistant to length extension attacks, but Git object storage is immune > to those due to the length field at the beginning. > > SHA-256 is somewhat slower to compute than SHA-1 in software. However, > since our default SHA-1 implementation is collision-detecting, a > reasonable cryptographic library implementation of SHA-256 will actually > be faster than SHA-256. In addition, modern ARM and AMD processors (and > some Intel processors) contain instructions for implementing SHA-256 in > hardware, making it the fastest possible option. > > There are other reasons to select SHA-256. With signed commits and > tags, it's possible to use SHA-256 for signatures and therefore have to > rely on only one hash algorithm for security. None of this is wrong, but I think this would be better off as a simple "See Documentation/technical/hash-function-transition.txt for why we're switching to SHA-256", and to the extent that something is said here that isn't said there it could be a patch to amend that document. > Add a basic implementation of SHA-256 based off libtomcrypt, which is in > the public domain. Optimize it and tidy it somewhat. For future changes & maintenance of this, let's do that in two steps. One where we add the upstream code as-is, and another where the tidying / cleanup / git specific stuff is wired, which makes it easy to audit upstream as-is v.s. our changes in isolation. Also in the first of those commits, say in the commit message "add a [libtomcrypt] copy from such-and-such a URL at such-and-such a version", so that it's easy to reproduce the import & find out how to re-update it. Is this something we see ourselves perma-forking? Or as with sha1dc are we likely to pull in upstream changes from time-to-time?SHA256 obiously isn't under active development, but there's been some churn in the upstream code since it was added, and if you're doing some optimizing / tidying that's presumably something upstream could benefit from as well, as well as just us being nicer open source citizens feeding e.g. portability fixes to upstream (since git tends to get ported a lot). So I wonder if we can't convince them to add a few macros to their code, and then do something like what I did in a0103914c2 ("sha1dc: update from upstream", 2017-05-20) for sha1dc allowing us to use their code as-is with some defines in the Makefile, which both makes it easier to update, and sets up a process where our default approach is to submit changes upstream, instead of working on our perma-fork.