Hi Ævar On 02/02/2023 09:32, Ævar Arnfjörð Bjarmason wrote:
As reported in https://lore.kernel.org/git/a812a664-67ea-c0ba-599f-cb79e2d96694@xxxxxxxxx/ changing the default "tgz" output method of from "gzip(1)" to our internal "git archive gzip" (using zlib ) broke things for users in the wild that assume that the "git archive" output is stable, most notably GitHub: https://github.com/orgs/community/discussions/45830
>
Leaving aside the larger question of whether we're going to promise output stability for "git archive" in general, the motivation for that change was to have a working compression method on systems that lacked a gzip(1).
As I recall the reduction in cpu time used to create a compressed archive was a factor in making it the default.
As the disruption of changing the default isn't worth it, let's use gzip(1) again by default, and only fall back on the new "git archive gzip" if it isn't available.
Playing devil's advocate for a moment as we're not going to promise that the compressed output of "git archive" will be stable in the future perhaps we should use this breakage as an opportunity to highlight that to users and to advertize the config setting that allows them to use gzip for compressing archives. Reverting the change gives the misleading impression that we're making a commitment to keeping the output stable. The focus of this thread seems to be the problems relating to github which they have already addressed.
I think there is general agreement that it is not practical to promise that the compressed output of "git archive" is stable so maybe it is better to make that clear now while users can work around it in the short term with a config setting rather than waiting until we're faced with some security or other issue that forces a change to the output which users cannot work around so easily.
Best Wishes Phillip
The later parts of this series then document and test for the output stability of the command. We're not promising anything new there, except that we now promise that we're going to use "gzip" as the default compressor, but that it's up to that command to be stable, should the user desire output stability. The documentation discusses the various caveats involved, suggests alternatives to checksumming compressed archives, but in the end notes what's been the policy so far: We're not promising that the "tar" output is going to be stable. The early parts of this series (1-2/9) are clean-up for existing config drift, as later in the series we'll otherwise need to change the divergent config documentation in two places. CI & branch for this at: https://github.com/avar/git/tree/avar/archive-internal-gzip-not-the-default Ævar Arnfjörð Bjarmason (9): archive & tar config docs: de-duplicate configuration section git config docs: document "tar.<format>.{command,remote}" archiver API: make the "flags" in "struct archiver" an enum archive: omit the shell for built-in "command" filters archive-tar.c: move internal gzip implementation to a function archive: use "gzip -cn" for stability, not "git archive gzip" test-lib.sh: add a lazy GZIP prerequisite archive tests: test for "gzip -cn" and "git archive gzip" stability git archive docs: document output non-stability Documentation/config/tar.txt | 29 +++++++- Documentation/git-archive.txt | 96 +++++++++++++++++++------- archive-tar.c | 78 ++++++++++++++------- archive.h | 11 +-- t/t5000-tar-tree.sh | 2 - t/t5005-archive-stability.sh | 70 +++++++++++++++++++ t/t5562-http-backend-content-length.sh | 2 - t/test-lib.sh | 4 ++ 8 files changed, 231 insertions(+), 61 deletions(-) create mode 100755 t/t5005-archive-stability.sh