On 2023-02-06 at 22:18:47, Ævar Arnfjörð Bjarmason wrote: > Maybe there are other changes in the proposed spec that put it at odds > with such a goal, it's unclear to me if this is the only difference. As mentioned in the description, that doesn't address trees, which have never been consistent traditionally. We also have bad permissions for pax headers (always 666), which is something we've tried to fix before and is not something we want to carry on with. You specifically sent a patch stating that we're not guaranteeing that format, and I agree with that assessment. I'm proposing a format that we would guarantee and which does not have any of the historical baggage or warts that we don't want to keep. This format also doesn't serialize timestamps; everything is at the Epoch. Again, that's because serializing a commit and its tree or even a tag and its commit would produce different results. > But I don't see why we need bit-for-bit compatible output between SHA-1 > and SHA-256 git repos for the reasons noted in the linked-to reply, and > removing this will remove a *really useful* aspect of our tar format, > which is that you can grab an arbitrary tarball, and see what commit > it's produced from. True, but this is a highly obscure feature and I've never used it outside of testing. If you want it, you can have it: you just want the default format, which serializes it in the header, and not the extremely restricted format I'm proposing here which is designed to never ever change. We might well decide to add cool new features and useful information to the default format, but this one will be fixed forever. > Even if you want to retain SHA-1 and SHA-256 interop as far as tar is > concerned, an un-discussed alternative is to just stick the SHA-1 OID > into the SHA-256 archive. > > For repos that are migrated we envision having such a bi-directional > mapping anyway. > > And for those that started out as SHA-256, or where we no longer care > about compatibility with old SHA-1, we can just start including the > SHA-256 OID, as all compatibility concerns have gone away when we > stopped bothering to maintain the mapping, no? Whether SHA-1 or SHA-256 or both are present in the repo is a local decision. The transition plan specifically anticipates people either preferring one hash or the other in output. The behaviour is not "use SHA-1 if there's SHA-1 and use SHA-256 otherwise", because even if everyone has SHA-256 and prefers it on their system, some people may still have SHA-1 for historical reasons and that would lead to different output. Part of this is because I anticipate that once the interop work is done, GitHub may transition repositories on the server to SHA-256 with SHA-1 interop for existing SHA-1 repositories. People are still going to have a fit if tarball data breaks at some point because the repository owner decided to flip the default hash algorithm, and I'm specifically proposing a format that is not going to direct hordes of angry users in my direction or the repository owner's in that case. Lots of people are going to avoid switching the default hash algorithm if it breaks tarballs, and I specifically don't want to encourage people sticking with SHA-1 for that reason. > This is the nth reference to "the standard". I think this would be > improved by linking to it, isn't it > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html ? Yeah, I'll do that. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature