On Sun Jun 16, 2024 at 10:22 AM IST, Jeff King wrote: > On Sun, Jun 16, 2024 at 01:44:07AM +0530, Ghanshyam Thakkar wrote: > > > On Fri, 24 May 2024, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > > Christian Couder <christian.couder@xxxxxxxxx> writes: > > > > > > >> Can we refactor this test to stop doing that? E.g., would it work if we > > > >> used git-hash-object(1) to check that SHA1DC does its thing? Then we > > > >> could get rid of the helper altogether, as far as I understand. > > > > > > > > It could perhaps work if we used git-hash-object(1) instead of > > > > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing, > > > > but we could do that in a separate patch or patch series. > > > > > > Yeah, I think such a plan to make preliminary refactoring as a > > > separate series, and then have another series to get rid of > > > "test-tool sha1" (and "test-tool sha256" as well?) on top of it > > > would work well. > > > > It seems that git-hash-object does not die (or give an error) when > > providing t0013/shattered-1.pdf, and gives a different hash than the > > one explicitly mentioned t0013-sha1dc.sh. I suppose it is silently > > replacing the hash when it detects the collision. Is this an expected > > behaviour? > > The shattered files do not create a collision (nor trigger the detection > in sha1dc) when hashed as Git objects. The reason is that Git objects > are not a straight hash of the contents, but have the object type and > size prepended. One _could_ use the same techniques that created the > shattered files to create a colliding set of Git objects, but AFAIK > nobody has done so (and it probably costs tens of thousands of USD, > though perhaps getting cheaper every year). > > So no, git-hash-object can't be used to test this. You have to directly > hash some contents with sha1, and I don't think there is any way to do > that with regular Git commands. Anything working with objects will use > the type+size format. We also use sha1 for the csum-file.[ch] mechanism, > where it is a straight hash of the contents (and we use this for > packfiles, etc). But there's not an easy way to feed an arbitrary file > to that system. > > It's possible there might be a way to abuse hashfd_check() to feed an > arbitrary file. E.g., stick shattered-1.pdf into a .pack file or > something, then ask "index-pack --verify" to check it. But I don't think > even that works, because before we even get to the final checksum, we're > verifying the actual contents as we go. > > So I think we need to keep some mechanism for computing the sha1 of > arbitrary contents. Thank you for the detailed explanation. Then I suppose we should keep these helpers (test-{sha1, sha256, hash}) as it is.