On Wed, Mar 12, 2025 at 4:59 AM Simon Josefsson <simon@xxxxxxxxxxxxx> wrote: > > Hi. > > Thank you for the "git-archive" and "git-bundle" features, making it > easier to do source-based builds in a no-Internet environment. > > I have published a Git bundle of Gnulib: > > https://www.gnu.org/software/gnulib/manual/html_node/Gnulib-Git-Bundle.html > > As you can see at the end, I struggle to come up with a recipe to allow > others to reproduce the git bundle that I created. > > If I run the recipe above twice (including the clone), I get different > checksums. This even if nothing was committed in the remote repository > meanwhile. > > Is it possible to create a bit-by-bit reproducible git bundle using some > other set of commands? If so, how? I'm using git 2.48.1 from Guix. > > Can anyone explain what is causing the irreproducibility? Running > diffoscope is not helpful, since the bundle is compressed and diffoscope > doesn't seem to know how to untangle it. Spent some time on this, and when I followed the instructions, the diffs were in the pack file portion of the bundle file, different "tree" objects were produced at different points in the pack file. But it produces identical bundles if I run `git bundle create` multiple times in the same clone. My guess is that the non-determinism is coming from the clone process being multi-threaded, meaning that the order things are created in the filesystem during the clone, presumably due to multithreading happening during the clone process, or maybe during gc? The contents of .git/objects/pack have different hashes across my two clones, and I haven't investigated why. > > If this is not possible today, what do you think about changes to make > this work? What is your end goal with being able to reproduce the bundles? Bundles are just a list of refs and a pack file, I think. Reproducing the bundle doesn't provide any more security than git provides when it writes the pack file to disk - if you end up with commits with the same hashes, the bundle has to be *effectively* the same as a git clone of the repository. Producing an identical bit-for-bit bundle might be doable by doing some form of sorting of the objects in the pack file, but this would only get us closer to bit-for-bit reproducibility *on the same machine and versions of everything*. There could be some changes to git, zlib, machine architecture, etc. that causes deterministic but different values to be produced. As an example, maybe future versions of zlib compress better, producing an equal result when decompressed, but a different compressed result. > > Thanks, > /Simon