"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: >> +* We will do our best not to change the "tar" output itself, but won't >> + promise that we're never going to change it. >> ++ >> +If you must avoid using "git" itself for the tree validation, you >> +should be checksumming the uncompressed "tar" output, not e.g. the >> +compressed "tgz" output. >> ++ > > I don't think I want to state this, because it implies that the changes > I made that broke kernel.org (making tar.umask apply to pax headers) > wouldn't have been allowed. We should probably just state that "we > won't promise that the tar output won't change between versions". Maybe, > "We won't change the tar output needlessly, but it may change from time > to time." That is, we won't be "let's change the format just to mix it > up for users", but if there's a valuable patch that could be applied, > then we might well take it. I agree with you. Giving "will do our best not to" is still too strong for that. We won't change the format willy-nilly but when there is a good reason to do so, we should be able to fix or improve the output. >> +While you shouldn't assume that different versions of git will emit >> +the same output, you can assume (e.g. for the purposes of caching) >> +that a given version's output is stable. > > Unfortunately, this isn't actually true if someone uses export-subst. > That's because adding unrelated objects can increase the length of > abbreviations, and then the tar contents can be different. I've > actually seen this in the wild. "subst" is certainly an issue, especially when the substitution is unstable. There shouldn't be cross platform differences to break bit-for-bit stability at least for "tar" format, as we do not rely on any external library. Can we say the same for "zip"? I thought we throw the blob at git_deflate_*() so the exact bitstream is up to the libz implementation?