On Thu, Mar 13, 2025 at 1:18 PM Simon Josefsson <simon@xxxxxxxxxxxxx> wrote: > > Jeff King <peff@xxxxxxxx> writes: > > > [now without threading] > > $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum > > c897caf9c68d2c37d997d3973196886af3b0b46e - > > > > [and we can do it again. yay!] > > $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum > > c897caf9c68d2c37d997d3973196886af3b0b46e - > > That's the commands I use -- it doesn't lead to the same hash in two > different 'git clone's. I tried running 'git clone' with the same '-c > pack.threads=1' but it made no difference. > > > 2. There is no way to pass pack-objects options down through > > git-bundle. So you'd have to either assemble the bundle yourself, > > or perhaps generate a stable on-disk pack state, and then generate > > the bundle. Perhaps something like: > > > > # make one single pack, with no reuse, using the default options > > git -c pack.threads=1 repack -adf > > Yay! You may have solved this for me. I have to verify this a bit > more, but this looks promising (these are two different git clones): > > jas@kaka:~/t/gnulib-1$ git -c pack.threads=1 repack -adf > jas@kaka:~/t/gnulib-1$ git -c 'pack.threads=1' bundle create gnulib.bundle --all > jas@kaka:~/t/gnulib-1$ sha256sum gnulib.bundle > c780bb07501cf016e702fbe3f52704b4f64edd6882c13c9be0f3f114c894e890 gnulib.bundle > jas@kaka:~/t/gnulib-1$ cd ../gnulib-2 > jas@kaka:~/t/gnulib-2$ git -c pack.threads=1 repack -adf > jas@kaka:~/t/gnulib-2$ git -c 'pack.threads=1' bundle create gnulib.bundle --all > jas@kaka:~/t/gnulib-2$ sha256sum gnulib.bundle > c780bb07501cf016e702fbe3f52704b4f64edd6882c13c9be0f3f114c894e890 gnulib.bundle > jas@kaka:~/t/gnulib-2$ > > > So I think it's possible, but I doubt it's very ergonomic. You're > > probably better off using some checksum over Git's logical model, rather > > than the stored bytes. The obvious one is that a single Git commit hash > > unambiguously represents the whole tree and all of history leading up to > > it, because of the chains of hashes. > > > > But that implies you trust Git's object hash algorithm. > > Right -- I think anything but bit-by-bit identical files is going to be > too complex to verify. I'm curious what specific attacks you're trying to catch here. Because to get into a situation where you unbundle the bundle and have the same commit hash but different contents, you would need to have a collision in the SHA-1 hash for some object (or SHA-256 hash if the repo is using that). If you're also providing the instructions (or even just the commit hash and server to clone from, and linking to instructions maintained elsewhere) to validate the bundle is legitimate, it seems MUCH easier to just replace those validation instructions to point to a commit/server that has already been backdoored than it would be to generate a SHA-1 collision that would go undetected. > > > # print all commits in topological order, with ties broken by > > # committer date, which should be stable. And then follow up with the > > # trees and blobs for each. > > git rev-list --topo-order --objects HEAD >objects > > > > # now print the contents of each object (preceded by its name, type, > > # and length, so there's no chance of weird prepending or appending > > # attacks). We cut off the path information from rev-list here, since > > # the ordered set of objects is all we care about. > > cut -d' ' -f1 objects | > > git cat-file --batch >content > > > > # and then take a hash over that content; this will be unambiguous. > > sha256sum <content > > How to read this output? Could this be made git bundle compatible? > > But if the above is solves it, this part isn't necessary. > > /Simon