On Wed, Mar 12, 2025 at 12:40:05PM +0100, Simon Josefsson wrote: > If I run the recipe above twice (including the clone), I get different > checksums. This even if nothing was committed in the remote repository > meanwhile. > > Is it possible to create a bit-by-bit reproducible git bundle using some > other set of commands? If so, how? I'm using git 2.48.1 from Guix. As Junio noted, multithreading is the first problem. E.g., here are some commands on git.git, using my 8-core machine: [try once...] $ git bundle create --no-progress - HEAD | sha1sum 686da850200da487032c9d91bdc544b605a3e426 - [and again; oops, it's different] $ git bundle create --no-progress - HEAD | sha1sum 70b018c16d244f32b36e55deb931e29ae15506e3 - [now without threading] $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum c897caf9c68d2c37d997d3973196886af3b0b46e - [and we can do it again. yay!] $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum c897caf9c68d2c37d997d3973196886af3b0b46e - What's happening here is that the bundle mostly consists of a packfile, where many objects will be stored as deltas against others. The search for deltas is multi-threaded, so it will find slightly different ones each time (there surely is an "optimal" answer, but finding it is much too expensive, so we bound the search with some heuristics). So disabling threading gives you a deterministic answer. But that's not the end of the story! We only search for deltas of objects that are not already stored as deltas in on-disk packfiles. We try to reuse any deltas we have already on disk (assuming that both the delta and its base are going to be in the output). There are options to ask pack-objects (the command which git-bundle uses under the hood to generate the pack) not to reuse deltas. So pack-objects running on a single thread without any delta reuse should generate a deterministic pack. But there are some gotchas: 1. It's stable only for a given Git version, and with a particular set of delta window/depth options. I wouldn't expect behavior to change much between versions, but it's not something that we try to guarantee. 2. There is no way to pass pack-objects options down through git-bundle. So you'd have to either assemble the bundle yourself, or perhaps generate a stable on-disk pack state, and then generate the bundle. Perhaps something like: # make one single pack, with no reuse, using the default options git -c pack.threads=1 repack -adf # now we can make a bundle from that. We probably do not even # need to disable threads here, since we'd just be picking the # deltas from the on-disk file (assuming that you're including # all objects in the bundle) git bundle create - | sha1sum 3. It will be really slow. We're throwing out all of the deltas and searching from scratch. And doing it single-threaded. I didn't time it, but I'd guess from past experience we're talking about hours to generate the bundle for something like linux.git. So I think it's possible, but I doubt it's very ergonomic. You're probably better off using some checksum over Git's logical model, rather than the stored bytes. The obvious one is that a single Git commit hash unambiguously represents the whole tree and all of history leading up to it, because of the chains of hashes. But that implies you trust Git's object hash algorithm. If you don't trust sha1 (and don't want to try out the sha256 support), then you'd have to design something else. Perhaps something like: # print all commits in topological order, with ties broken by # committer date, which should be stable. And then follow up with the # trees and blobs for each. git rev-list --topo-order --objects HEAD >objects # now print the contents of each object (preceded by its name, type, # and length, so there's no chance of weird prepending or appending # attacks). We cut off the path information from rev-list here, since # the ordered set of objects is all we care about. cut -d' ' -f1 objects | git cat-file --batch >content # and then take a hash over that content; this will be unambiguous. sha256sum <content -Peff