Re: [PATCH 0/7] PREVIEW: Introduce DC_AND_OPENSSL_SHA1 make flag

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Wed, 29 Mar 2017 22:02:17 +0200 (CEST)

Hi Junio,

On Fri, 24 Mar 2017, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@xxxxxx> writes:
> 
> > - the most important part will be the patch turning core.enableSHA1DC
> > into a tristate: "externalOnly" or "smart" or "auto" or something
> > indicating that it switches on collision detection only for commands
> > that accept objects from an outside source into the local repository,
> > such as fetch, fast-import, etc
> 
> There are different uses of SHA-1 hashing in Git, and I do agree
> that depending on the use, some of them do not need the overhead for
> the collision-attack detection.

Indeed.

I guess I should have clarified very clearly what I intended to accomplish
with this preview.

Let me first summarize the background:

- Git originally used SHA-1 as a convenient catch-all hashing algorithm,
  with the security of the hash merely being a nice afterthought.
  Unfortunately, SHA-1 was hardcoded. Mistakes were made. They always are.

- After the SHAttered blog post became public, Linus first made the case
  that it matters not all that much: the development of the Linux kernel
  is based on trust, and nobody would pull from a person they do not trust.
  This approach does obviously not extend to most other projects.

- By switching the default to DC_SHA1 already in `master`, you now took
  the exact opposite position: it *always* matters, even when you trust
  people, and the 6x slowdown demonstrated by my perf test is something that
  everybody needs to accept, even if it is spent for no benefit in return.

Between these two extremes ("collision attacks do not matter if you run
your project based on trust" vs "we always mistrust everybody, including
the user's own source code that they stage for commit"), I think there are
many shades of green, and I think it would be delusional to believe that
we can predict the trust model for each and every Git user [*1*], baking
it into a single compile time setting.

That is why I wanted to implement a tristate config so that users can
adapt Git to their particular scenario. That way, maintainers of
precompiled Git packages do not have to dictate to Git users what trust
model they should use.

One scenario seems to be common, and it is certainly one that I have a
direct interest in supporting: inside a company, where the server as well
as the developers are implicitly trusted not to fiddle with collision
attacks (because there are much easier ways to hide malicious code, let's
be frank). And in this scenario, slowing down the SHA-1 computation half
an order of magnitude by trying to detect collision attacks is simply
unacceptable, because there is no benefit, only cost to it.

An even more common scenario is when a developer works on a local
repository, adds a couple of files, then runs rebase, then merges, etc. In
*none* of these cases does the developer distrust any of the objects
flying about. Like above, forcing the developer to accept half an order of
magnitude slow down of the SHA-1 computation is something I would consider
disrespectful of those developers' time. Note: in this scenario, any
object coming from elsewhere would most likely be subject to the collision
detection, as the developer may not trust anybody but themselves.

In other words: I disagree that, say, `git add` should use the collision
detecting SHA-1.

I also suspect that you had a much more elaborate (maybe even fragile)
strategy than mine in mind when you tried to determine which code paths
would need collision detection and which ones would not: we have *no*
context in the object-oriented sense whenever we call the object hashing
functions. Meaning that you would have to introduce such a context, or to
add some sort of thread local state. I have to admit that I do not like
either way.

The approach I chose instead was to make the switch global, per command.
Obviously, the next step is to identify the Git commands which accept
objects from external sources (`clone`, `fetch`, `fast-import` and all the
remote helpers come to mind) and let them set a global flag before asking
git_default_config() to parse the core.enableSHA1DC setting, so that the
special value `external` could trigger the collision detecting code for
those, and only those, commands. That would still err on the safe side if
these commands are used to hash objects originating from the same machine,
but that is a price I would be willing to pay for the simplicity of this
approach.

Does my explanation manage to illustrate in a plausible way why I chose
the approach that I did?

Ciao,
Johannes

Footnote *1*: It is equally delusional, of course, to claim that every Git
user can and should configure and compile Git for themselves.