Re: [RFC PATCH 09/12] Add a base implementation of SHA-256 support

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 29 Aug 2018 11:32:08 +0200

On Wed, Aug 29 2018, brian m. carlson wrote:

> SHA-1 is weak and we need to transition to a new hash function.  For
> some time, we have referred to this new function as NewHash.
>
> The selection criteria for NewHash specify that it should (a) be 256
> bits in length, (b) have high quality implementations available, (c)
> should match Git's needs in terms of security, and (d) ideally, be fast
> to compute.
>
> SHA-256 has a variety of high quality implementations across various
> libraries.  It is implemented by every cryptographic library we support
> and is available on every platform and in almost every programming
> language.  It is often highly optimized, since it is commonly used in
> TLS and elsewhere.  Additionally, there are various command line
> utilities that implement it, which is useful for educational and testing
> purposes.
>
> SHA-256 is presently considered secure and has received a reasonable
> amount of cryptanalysis in the literature.  It is, admittedly, not
> resistant to length extension attacks, but Git object storage is immune
> to those due to the length field at the beginning.
>
> SHA-256 is somewhat slower to compute than SHA-1 in software.  However,
> since our default SHA-1 implementation is collision-detecting, a
> reasonable cryptographic library implementation of SHA-256 will actually
> be faster than SHA-256.  In addition, modern ARM and AMD processors (and
> some Intel processors) contain instructions for implementing SHA-256 in
> hardware, making it the fastest possible option.
>
> There are other reasons to select SHA-256.  With signed commits and
> tags, it's possible to use SHA-256 for signatures and therefore have to
> rely on only one hash algorithm for security.

None of this is wrong, but I think this would be better off as a simple
"See Documentation/technical/hash-function-transition.txt for why we're
switching to SHA-256", and to the extent that something is said here
that isn't said there it could be a patch to amend that document.

> Add a basic implementation of SHA-256 based off libtomcrypt, which is in
> the public domain.  Optimize it and tidy it somewhat.

For future changes & maintenance of this, let's do that in two
steps. One where we add the upstream code as-is, and another where the
tidying / cleanup / git specific stuff is wired, which makes it easy to
audit upstream as-is v.s. our changes in isolation. Also in the first of
those commits, say in the commit message "add a [libtomcrypt] copy from
such-and-such a URL at such-and-such a version", so that it's easy to
reproduce the import & find out how to re-update it.

Is this something we see ourselves perma-forking? Or as with sha1dc are
we likely to pull in upstream changes from time-to-time?SHA256 obiously
isn't under active development, but there's been some churn in the
upstream code since it was added, and if you're doing some optimizing /
tidying that's presumably something upstream could benefit from as well,
as well as just us being nicer open source citizens feeding
e.g. portability fixes to upstream (since git tends to get ported a
lot).

So I wonder if we can't convince them to add a few macros to their code,
and then do something like what I did in a0103914c2 ("sha1dc: update
from upstream", 2017-05-20) for sha1dc allowing us to use their code
as-is with some defines in the Makefile, which both makes it easier to
update, and sets up a process where our default approach is to submit
changes upstream, instead of working on our perma-fork.