On 2023-02-02 at 09:32:29, Ævar Arnfjörð Bjarmason wrote: > +[[STABILITY]] > +OUTPUT STABILITY > +---------------- > + > +The output of 'git archive' is not guaranteed to be stable, and may > +change between versions. > + > +There are many valid ways to encode the same data in the tar format > +itself. For non-`tar` arguments to the `--format` option we rely on > +external tools (or libraries) for compressing the output we generate. > + > +The `tar` format contains the commit ID in the pax header (see the > +<<DESCRIPTION>> section above). A repository that's been migrated from > +SHA-1 to SHA-256 will therefore have different `tar` output for the > +"same" commit. See `extension.objectFormat` in linkgit:git-config[1]. > + > +Instead of relying on the output of `git archive`, you should prefer > +to stick to git's own transport protocols, and e.g. validate releases > +with linkgit:git-tag[1]'s `--verify` option. > + > +Despite the output of `git archive` having never been promised to be > +stable, various users in the wild have come to rely on that being the > +case. > + > +Most notably, large hosting providers provide a way to download a > +given tagged release as a `git archive`. Some downstream tools then > +expect the content of that archive to be stable. When that's changed > +widespread breakage has been observed, see > +https://github.com/orgs/community/discussions/45830 for one such case. > + > +While we won't promise that the output won't change in the future, we > +are aware of these users, and will try to avoid changing it > +willy-nilly. Furthermore, we make the following promises: > + > +* The default gzip compression tool will continue to be gzip(1). If > + you rely on this being e.g. GNU gzip for the purposes of stability, > + it's up to you to ensure that its output is stable across > + versions. > ++ > + > +We in turn promise to not e.g. make the internal "git archive gzip" > +implementation the default, as it produces different ouput than > +gzip(1) in some case. I think this is fine up to here. > +* We will do our best not to change the "tar" output itself, but won't > + promise that we're never going to change it. > ++ > +If you must avoid using "git" itself for the tree validation, you > +should be checksumming the uncompressed "tar" output, not e.g. the > +compressed "tgz" output. > ++ I don't think I want to state this, because it implies that the changes I made that broke kernel.org (making tar.umask apply to pax headers) wouldn't have been allowed. We should probably just state that "we won't promise that the tar output won't change between versions". Maybe, "We won't change the tar output needlessly, but it may change from time to time." That is, we won't be "let's change the format just to mix it up for users", but if there's a valuable patch that could be applied, then we might well take it. As I said, it's my goal to provide more concrete guarantees in a future patch, probably this weekend. > +* We promise that a given version of git will emit stable "tar" output > + for the same tree ID (but not commit ID, see the discussion in the > + <<DESCRIPTION>> section above). I think that section contradicts this. The tree version uses the current timestamp, which would make the archive change based on the time of day. > +While you shouldn't assume that different versions of git will emit > +the same output, you can assume (e.g. for the purposes of caching) > +that a given version's output is stable. Unfortunately, this isn't actually true if someone uses export-subst. That's because adding unrelated objects can increase the length of abbreviations, and then the tar contents can be different. I've actually seen this in the wild. Modulo that, yes, I agree with this. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature