Hi Junio, On Fri, 24 Mar 2017, Junio C Hamano wrote: > Johannes Schindelin <johannes.schindelin@xxxxxx> writes: > > > - the most important part will be the patch turning core.enableSHA1DC > > into a tristate: "externalOnly" or "smart" or "auto" or something > > indicating that it switches on collision detection only for commands > > that accept objects from an outside source into the local repository, > > such as fetch, fast-import, etc > > There are different uses of SHA-1 hashing in Git, and I do agree > that depending on the use, some of them do not need the overhead for > the collision-attack detection. Indeed. I guess I should have clarified very clearly what I intended to accomplish with this preview. Let me first summarize the background: - Git originally used SHA-1 as a convenient catch-all hashing algorithm, with the security of the hash merely being a nice afterthought. Unfortunately, SHA-1 was hardcoded. Mistakes were made. They always are. - After the SHAttered blog post became public, Linus first made the case that it matters not all that much: the development of the Linux kernel is based on trust, and nobody would pull from a person they do not trust. This approach does obviously not extend to most other projects. - By switching the default to DC_SHA1 already in `master`, you now took the exact opposite position: it *always* matters, even when you trust people, and the 6x slowdown demonstrated by my perf test is something that everybody needs to accept, even if it is spent for no benefit in return. Between these two extremes ("collision attacks do not matter if you run your project based on trust" vs "we always mistrust everybody, including the user's own source code that they stage for commit"), I think there are many shades of green, and I think it would be delusional to believe that we can predict the trust model for each and every Git user [*1*], baking it into a single compile time setting. That is why I wanted to implement a tristate config so that users can adapt Git to their particular scenario. That way, maintainers of precompiled Git packages do not have to dictate to Git users what trust model they should use. One scenario seems to be common, and it is certainly one that I have a direct interest in supporting: inside a company, where the server as well as the developers are implicitly trusted not to fiddle with collision attacks (because there are much easier ways to hide malicious code, let's be frank). And in this scenario, slowing down the SHA-1 computation half an order of magnitude by trying to detect collision attacks is simply unacceptable, because there is no benefit, only cost to it. An even more common scenario is when a developer works on a local repository, adds a couple of files, then runs rebase, then merges, etc. In *none* of these cases does the developer distrust any of the objects flying about. Like above, forcing the developer to accept half an order of magnitude slow down of the SHA-1 computation is something I would consider disrespectful of those developers' time. Note: in this scenario, any object coming from elsewhere would most likely be subject to the collision detection, as the developer may not trust anybody but themselves. In other words: I disagree that, say, `git add` should use the collision detecting SHA-1. I also suspect that you had a much more elaborate (maybe even fragile) strategy than mine in mind when you tried to determine which code paths would need collision detection and which ones would not: we have *no* context in the object-oriented sense whenever we call the object hashing functions. Meaning that you would have to introduce such a context, or to add some sort of thread local state. I have to admit that I do not like either way. The approach I chose instead was to make the switch global, per command. Obviously, the next step is to identify the Git commands which accept objects from external sources (`clone`, `fetch`, `fast-import` and all the remote helpers come to mind) and let them set a global flag before asking git_default_config() to parse the core.enableSHA1DC setting, so that the special value `external` could trigger the collision detecting code for those, and only those, commands. That would still err on the safe side if these commands are used to hash objects originating from the same machine, but that is a price I would be willing to pay for the simplicity of this approach. Does my explanation manage to illustrate in a plausible way why I chose the approach that I did? Ciao, Johannes Footnote *1*: It is equally delusional, of course, to claim that every Git user can and should configure and compile Git for themselves.