On Wed, 1 Feb 2023 at 14:49, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > > On Wed, Feb 01 2023, demerphq wrote: > > > On Wed, 1 Feb 2023, 20:21 Michal Suchánek, <msuchanek@xxxxxxx> wrote: > >> > >> On Wed, Feb 01, 2023 at 12:34:06PM +0100, demerphq wrote: > >> > Why does it have to be gzip? It is not that hard to come up with a > > > >> historical reasons? > > > > Currently git doesn't advertise that archive creation is stable > > right[1]? So I wrote that with the assumption that this new > > compression would only be used when making a new archive with a > > hypothetical new '--stable' option. So historical reasons don't come > > up. Or was there some other form of history that you meant? > > We haven't advertised it, but people have come to rely on it, as the > widespread breakages reported when upgrading to v2.38.0 at the start of > this thread show. > > That's unfortunate, and those people probably shouldn't have done that, > but that's water under the bridge. I think it would be irresponsible to > change the output willy-nilly at this point, especially when it seems > rather easy to find some compromise everyone will be happy with. > > > I'm just trying to point out here that stable compression is doable > > and doesn't need to be as complex as specifying a stable gzip format. > > I am not even saying git should just do this, just that it /could/ if > > it decided that stability was important, and that doing so wouldn't > > involve the complexity that Avar was implying would be needed. Simple > > compression like LZ variants are pretty straightforward to implement, > > achieve pretty good compression and can run pretty fast. > > > > Yves > > [1] if it did the issue kicking off this thread would not have > > happened as there would be a test that would have noticed the change. > > I have some patches I'm about to submit to address issues in this > thread, and it does add *a* test for archive output stability. > > But I'm not at all confident that it's exhaustive. I just found it by > experiment, by locating tests ouf ours where the "git archive" output at > the end is different with gzip and "git archive gzip". > > But is it guaranteed to find all potential cases where repository > content might trigger different output with different gzip > implementations? I don't know, but probably not. BTW, I just happened to be looking at the zstd docs (I am updating code that uses it), I saw this: Zstandard's format is stable and documented in [RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple independent implementations are already available. This repository represents the reference implementation, provided as an open-source dual [BSD](LICENSE) and [GPLv2](COPYING) licensed **C** library, and a command line utility producing and decoding `.zst`, `.gz`, `.xz` and `.lz4` files. Should your project require another programming language, a list of known ports and bindings is provided on [Zstandard homepage](http://www.zstd.net/#other-languages). So it sounds like that is a spec you could use. Not sure exactly what they mean by "stable", but given the .gz compatibility maybe it would be worth considering. Its a lot faster than zlib. (The library I support includes Snappy, Zlib, and Zstd, and the latter is faster and better than the other two.) Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"