Here is a minor reroll of mine and Peff's series to add a build-time knob to allow selecting an alternative SHA-1 implementation for non-cryptographic hashing within Git, starting with the `hashwrite()` family of functions. This version has changes which are limited to the commit messages only, and amount to: - Updated rationale for skipping the collision check from within finalize_object_file() when handling loose objects. - Updated commit message with some over-eager s/fast/unsafe/ conversions in the final patch. I think most of the review dust has settled up to this point, so I'm imagining that this is the final version of this series for now, or at least very close to it. But if something new comes up, please let me know! Thanks in advance for your review! Taylor Blau (8): finalize_object_file(): check for name collision before renaming finalize_object_file(): refactor unlink_or_warn() placement finalize_object_file(): implement collision check pack-objects: use finalize_object_file() to rename pack/idx/etc sha1: do not redefine `platform_SHA_CTX` and friends hash.h: scaffolding for _unsafe hashing variants Makefile: allow specifying a SHA-1 for non-cryptographic uses csum-file.c: use unsafe SHA-1 implementation when available Makefile | 25 ++++++ block-sha1/sha1.h | 2 + csum-file.c | 18 ++-- hash.h | 72 +++++++++++++++ object-file.c | 124 ++++++++++++++++++++++++-- object-file.h | 6 ++ pack-write.c | 7 +- sha1/openssl.h | 2 + sha1dc_git.h | 3 + t/t5303-pack-corruption-resilience.sh | 7 +- tmp-objdir.c | 26 ++++-- 11 files changed, 266 insertions(+), 26 deletions(-) Range-diff against v4: -: ---------- > 1: 6f1ee91fff finalize_object_file(): check for name collision before renaming -: ---------- > 2: 133047ca8c finalize_object_file(): refactor unlink_or_warn() placement 1: ed9eeef851 ! 3: 41d38352a4 finalize_object_file(): implement collision check @@ Commit message object name, so checking for collisions at the content level doesn't add anything. - This is why we do not bother to check the inflated object contents - for collisions either, since either (a) the object contents have the - fingerprint of a SHA-1 collision, in which case the collision - detecting SHA-1 implementation used to hash the contents to give us - a path would have already rejected it, or (b) the contents are part - of a colliding pair which does not bear the same fingerprints of - known collision attacks, in which case we would not have caught it - anyway. + Adding a content-level collision check would have to happen at a + higher level than in finalize_object_file(), since (avoiding race + conditions) writing an object loose which already exists in the + repository will prevent us from even reaching finalize_object_file() + via the object freshening code. + + There is a collision check in index-pack via its `check_collision()` + function, but there isn't an analogous function in unpack-objects, + which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. @@ Commit message Co-authored-by: Jeff King <peff@xxxxxxxx> Signed-off-by: Jeff King <peff@xxxxxxxx> + Helped-by: Elijah Newren <newren@xxxxxxxxx> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> ## object-file.c ## 2: 3cc7f7b1f6 = 4: 611475d83e pack-objects: use finalize_object_file() to rename pack/idx/etc 3: 8f8ac0f5b0 = 5: 9913a5d971 sha1: do not redefine `platform_SHA_CTX` and friends 4: d300e9c688 = 6: 65de6d724d hash.h: scaffolding for _unsafe hashing variants 5: af8fd9aa4e = 7: 3884cd0e3a Makefile: allow specifying a SHA-1 for non-cryptographic uses 6: 4b83dd05e9 ! 8: 62abddf73d csum-file.c: use unsafe SHA-1 implementation when available @@ Commit message These callers only use the_hash_algo to produce a checksum, which we depend on for data integrity, but not for cryptographic purposes, so - these callers are safe to use the unsafe (and potentially non-collision - detecting) SHA-1 implementation. + these callers are safe to use the unsafe (non-collision detecting) SHA-1 + implementation. To time this, I took a freshly packed copy of linux.git, and ran the following with and without the OPENSSL_SHA1_UNSAFE=1 build-knob. Both @@ Commit message $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1 59,164,001,176 (58.79%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects] - , and generate the resulting "clone" much unsafeer, in only 11.597 seconds + , and generate the resulting "clone" much faster, in only 11.597 seconds of wall time, 11.37 seconds of user time, and 0.23 seconds of system time, for a ~40% speed-up. base-commit: 6258f68c3c1092c901337895c864073dcdea9213 -- 2.46.1.507.gffd0c9a15b2.dirty