Hi... On Thu, Sep 26, 2024 at 8:22 AM Taylor Blau <me@xxxxxxxxxxxx> wrote: ... > I think most of the review dust has settled up to this point, so I'm > imagining that this is the final version of this series for now, or at > least very close to it. But if something new comes up, please let me > know! > ... > Range-diff against v4: > -: ---------- > 1: 6f1ee91fff finalize_object_file(): check for name collision before renaming > -: ---------- > 2: 133047ca8c finalize_object_file(): refactor unlink_or_warn() placement > 1: ed9eeef851 ! 3: 41d38352a4 finalize_object_file(): implement collision check > @@ Commit message > object name, so checking for collisions at the content level doesn't > add anything. > > - This is why we do not bother to check the inflated object contents > - for collisions either, since either (a) the object contents have the > - fingerprint of a SHA-1 collision, in which case the collision > - detecting SHA-1 implementation used to hash the contents to give us > - a path would have already rejected it, or (b) the contents are part > - of a colliding pair which does not bear the same fingerprints of > - known collision attacks, in which case we would not have caught it > - anyway. > + Adding a content-level collision check would have to happen at a > + higher level than in finalize_object_file(), since (avoiding race > + conditions) writing an object loose which already exists in the > + repository will prevent us from even reaching finalize_object_file() > + via the object freshening code. > + > + There is a collision check in index-pack via its `check_collision()` > + function, but there isn't an analogous function in unpack-objects, > + which just feeds the result to write_object_file(). > > So skipping the collision check here does not change for better or > worse the hardness of loose object writes. > @@ Commit message > > Co-authored-by: Jeff King <peff@xxxxxxxx> > Signed-off-by: Jeff King <peff@xxxxxxxx> > + Helped-by: Elijah Newren <newren@xxxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## object-file.c ## > 2: 3cc7f7b1f6 = 4: 611475d83e pack-objects: use finalize_object_file() to rename pack/idx/etc > 3: 8f8ac0f5b0 = 5: 9913a5d971 sha1: do not redefine `platform_SHA_CTX` and friends > 4: d300e9c688 = 6: 65de6d724d hash.h: scaffolding for _unsafe hashing variants > 5: af8fd9aa4e = 7: 3884cd0e3a Makefile: allow specifying a SHA-1 for non-cryptographic uses > 6: 4b83dd05e9 ! 8: 62abddf73d csum-file.c: use unsafe SHA-1 implementation when available > @@ Commit message > > These callers only use the_hash_algo to produce a checksum, which we > depend on for data integrity, but not for cryptographic purposes, so > - these callers are safe to use the unsafe (and potentially non-collision > - detecting) SHA-1 implementation. > + these callers are safe to use the unsafe (non-collision detecting) SHA-1 > + implementation. > > To time this, I took a freshly packed copy of linux.git, and ran the > following with and without the OPENSSL_SHA1_UNSAFE=1 build-knob. Both > @@ Commit message > $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1 > 59,164,001,176 (58.79%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects] > > - , and generate the resulting "clone" much unsafeer, in only 11.597 seconds > + , and generate the resulting "clone" much faster, in only 11.597 seconds > of wall time, 11.37 seconds of user time, and 0.23 seconds of system > time, for a ~40% speed-up. This round looks good to me.