On Thu, Feb 02 2023, brian m. carlson wrote: >> +* We will do our best not to change the "tar" output itself, but won't >> + promise that we're never going to change it. >> ++ >> +If you must avoid using "git" itself for the tree validation, you >> +should be checksumming the uncompressed "tar" output, not e.g. the >> +compressed "tgz" output. >> ++ > > I don't think I want to state this, because it implies that the changes > I made that broke kernel.org (making tar.umask apply to pax headers) > wouldn't have been allowed. I don't see how "we'll do our best, but it might change" precludes that... > We should probably just state that "we > won't promise that the tar output won't change between versions". Maybe, ...but it sounds like you'd like this "softer" promise. I think it's saying the same, but picked the "we'll try not to" wording because I think it more accurately reflects reality, but... > "We won't change the tar output needlessly, but it may change from time > to time." That is, we won't be "let's change the format just to mix it > up for users", but if there's a valuable patch that could be applied, > then we might well take it. ...here we're back (at least per my reading) to basically what my proposed patch said. I'm happy to improve/change the wording, but I'm confused about the "because it implies" part you noted. > As I said, it's my goal to provide more concrete guarantees in a future > patch, probably this weekend. I think that would be great, but also think that if we're going to make new guarantees it's probably best applied on top of a series such as this, which aside from the reverting back to gzip as the default attempts to clarify the status quo. > >> +* We promise that a given version of git will emit stable "tar" output >> + for the same tree ID (but not commit ID, see the discussion in the >> + <<DESCRIPTION>> section above). > > I think that section contradicts this. The tree version uses the > current timestamp, which would make the archive change based on the time > of day. Thanks! It's referring back to the previous discussion, but I managed to somehow get the tree & commit cases reversed. >> +While you shouldn't assume that different versions of git will emit >> +the same output, you can assume (e.g. for the purposes of caching) >> +that a given version's output is stable. > > Unfortunately, this isn't actually true if someone uses export-subst. > That's because adding unrelated objects can increase the length of > abbreviations, and then the tar contents can be different. I've > actually seen this in the wild. > > Modulo that, yes, I agree with this. I didn't know about the export-subst case, I'll add that caveat in there. Thanks!