Re: RFC: Another proposed hash function transition plan

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 17 Mar 2017 12:07:48 +0100 (CET)

Hi Kostis,

On Mon, 13 Mar 2017, ankostis wrote:

> On 13 March 2017 at 18:48, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> >
> > The Keccak Team wrote:
> >
> > > We have read your transition plan to move away from SHA-1 and
> > > noticed your intent to use SHA3-256 as the new hash function in the
> > > new Git repository format and protocol. Although this is a valid
> > > choice, we think that the new SHA-3 standard proposes alternatives
> > > that may also be interesting for your use cases.  As designers of
> > > the Keccak function family, we thought we could jump in the mail
> > > thread and present these alternatives.
> >
> > I indeed had some reservations about SHA3-256's performance.  The main
> > hash function we had in mind to compare against is blake2bp-256.  This
> > overview of other functions to compare against should end up being
> > very helpful.
> 
> What if some of us need this extra difficulty, and don't mind about the
> performance tax, because we need to refer to hashes 10 or 30 years from
> now, or even in the Post Quantum era?

If you need this extra difficulty, and if this extra difficulty would
imply a huge penalty for everybody else, it is safe to assume that that
extra difficulty would need to be an extra switch, off by default.

It simply shows that we put too much of a burden on SHA-1: we used it for
three separate purposes: to verify data integrity, to allow addressing
objects by their own content, and for signing entire commit histories
cryptographically (more as an afterthought, as I see it: the Linux project
provides the context where you never fetch from any untrusted source,
therefore cryptographically secure signatures are not quite as important
as the trust between maintainer and lieutenants).

We *will* have to separate those concerns, and maybe even switch to
different algorithms for the different concerns. There are much better
algorithms for validating data integrity, for example, including error
correction (which SHA-1 never wanted to do anyway).

In your case, I could imagine that you would simply require verifiable
cryptographic signatures (.asc files) to be committed together with the
documents; it would be much harder to find a collision where those
signatures still match (or a double collision where the forged document's
signature would collide with the non-forget document's signature, in
addition to the two documents colliding).

Another idea would be to use Jonathan Nieder's proposed transition plan
and simply extend it. That transition plan details how the objects would
be hashed with two algorithms locally and how to maintain a bidirectional
mapping between the two. You could simply piggyback on that code and
provide patches that allow for a third, configurable algorithm, and that
algorithm's hashes would simply be added to the commit objects and fsck
would then know to verify those, too. That would be an opt-in feature, of
course, so that only those who need the extra long term security have to
pay the price of a substantially slower hashing.

What we cannot do is to pick a super slow hash algorithm just to cater to
the use case where legal documents are managed, punishing everybody else
for using Git in the intended way: to manage source code.

Ciao,
Johannes