On Sun, Oct 16, 2022 at 11:57:40PM +0200, kpcyrd wrote: > multiple people in Arch Linux noticed the output of our `git archive` > command doesn't match the tarball served by github anymore. > > First I suspected an update in our gzip package until I found this line in > the git 2.38.0 release notes: > > > * Teach "git archive" to (optionally and then by default) avoid > > spawning an external "gzip" process when creating ".tar.gz" (and > > ".tgz") archives. > > I've then found this commit that could be considered a breaking change in > `git archive`: > > https://github.com/git/git/commit/4f4be00d302bc52d0d9d5a3d4738bb525066c710 > > I don't know if there's some kind of gzip standard that could be used to > align the git internal gzip implementation with gnu gzip. Interesting. For a small input, they seem to produce the same file for me: git init repo cd repo seq 1000 >file git add file git commit -m foo git -c tar.tar.gz.command='git archive gzip' \ archive --format=tar.gz HEAD >internal.tar.gz git -c tar.tar.gz.command='gzip -cn' \ archive --format=tar.gz HEAD >external.tar.gz cmp internal.tar.gz external.tar.gz && echo ok but if I instead do "seq 10000", then the files differ. I didn't dig into the actual binary to see the source of the change. It might be something we can tweak (e.g., if it's how a header is represented, or if we can change the zlib parameters to find the same compressions). > I'm not saying this is necessarily a bug or regression but it makes it > harder to reproduce github tar balls from a git repository. Just sharing > what I've debugged. :) I don't think we make promises about stable output from "git archive". We've fixed bugs in the tar-generating side before that lead to changes. But if we can easily make them the same, that might be worth doing. In the meantime, you can use the config option I showed above to get the old, external behavior. At some point GitHub will probably update their version, though, at which point you'd want the internal (they may also try to retain the old one, though; lots of distro/packaging projects get broken when GitHub's archives aren't byte-for-byte identical). -Peff