Re: Making bit-by-bit reproducible Git Bundles?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 12, 2025 at 4:59 AM Simon Josefsson <simon@xxxxxxxxxxxxx> wrote:
>
> Hi.
>
> Thank you for the "git-archive" and "git-bundle" features, making it
> easier to do source-based builds in a no-Internet environment.
>
> I have published a Git bundle of Gnulib:
>
> https://www.gnu.org/software/gnulib/manual/html_node/Gnulib-Git-Bundle.html
>
> As you can see at the end, I struggle to come up with a recipe to allow
> others to reproduce the git bundle that I created.
>
> If I run the recipe above twice (including the clone), I get different
> checksums.  This even if nothing was committed in the remote repository
> meanwhile.
>
> Is it possible to create a bit-by-bit reproducible git bundle using some
> other set of commands?  If so, how?  I'm using git 2.48.1 from Guix.
>
> Can anyone explain what is causing the irreproducibility?  Running
> diffoscope is not helpful, since the bundle is compressed and diffoscope
> doesn't seem to know how to untangle it.

Spent some time on this, and when I followed the instructions, the
diffs were in the pack file portion of the bundle file, different
"tree" objects were produced at different points in the pack file. But
it produces identical bundles if I run `git bundle create` multiple
times in the same clone. My guess is that the non-determinism is
coming from the clone process being multi-threaded, meaning that the
order things are created in the filesystem during the clone,
presumably due to multithreading happening during the clone process,
or maybe during gc? The contents of .git/objects/pack have different
hashes across my two clones, and I haven't investigated why.

>
> If this is not possible today, what do you think about changes to make
> this work?

What is your end goal with being able to reproduce the bundles?
Bundles are just a list of refs and a pack file, I think. Reproducing
the bundle doesn't provide any more security than git provides when it
writes the pack file to disk - if you end up with commits with the
same hashes, the bundle has to be *effectively* the same as a git
clone of the repository.

Producing an identical bit-for-bit bundle might be doable by doing
some form of sorting of the objects in the pack file, but this would
only get us closer to bit-for-bit reproducibility *on the same machine
and versions of everything*. There could be some changes to git, zlib,
machine architecture, etc. that causes deterministic but different
values to be produced. As an example, maybe future versions of zlib
compress better, producing an equal result when decompressed, but a
different compressed result.

>
> Thanks,
> /Simon





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux