Re: [RFC PATCH 1/1] Document a fixed tar format for interoperability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-02-06 at 22:18:47, Ævar Arnfjörð Bjarmason wrote:
> Maybe there are other changes in the proposed spec that put it at odds
> with such a goal, it's unclear to me if this is the only difference.

As mentioned in the description, that doesn't address trees, which have
never been consistent traditionally.  We also have bad permissions for
pax headers (always 666), which is something we've tried to fix before
and is not something we want to carry on with.

You specifically sent a patch stating that we're not guaranteeing that
format, and I agree with that assessment.  I'm proposing a format that
we would guarantee and which does not have any of the historical baggage
or warts that we don't want to keep.

This format also doesn't serialize timestamps; everything is at the
Epoch.  Again, that's because serializing a commit and its tree or even
a tag and its commit would produce different results.

> But I don't see why we need bit-for-bit compatible output between SHA-1
> and SHA-256 git repos for the reasons noted in the linked-to reply, and
> removing this will remove a *really useful* aspect of our tar format,
> which is that you can grab an arbitrary tarball, and see what commit
> it's produced from.

True, but this is a highly obscure feature and I've never used it
outside of testing.  If you want it, you can have it: you just want the
default format, which serializes it in the header, and not the extremely
restricted format I'm proposing here which is designed to never ever
change.  We might well decide to add cool new features and useful
information to the default format, but this one will be fixed forever.

> Even if you want to retain SHA-1 and SHA-256 interop as far as tar is
> concerned, an un-discussed alternative is to just stick the SHA-1 OID
> into the SHA-256 archive.
> 
> For repos that are migrated we envision having such a bi-directional
> mapping anyway.
> 
> And for those that started out as SHA-256, or where we no longer care
> about compatibility with old SHA-1, we can just start including the
> SHA-256 OID, as all compatibility concerns have gone away when we
> stopped bothering to maintain the mapping, no?

Whether SHA-1 or SHA-256 or both are present in the repo is a local
decision.  The transition plan specifically anticipates people either
preferring one hash or the other in output.  The behaviour is not "use
SHA-1 if there's SHA-1 and use SHA-256 otherwise", because even if
everyone has SHA-256 and prefers it on their system, some people may
still have SHA-1 for historical reasons and that would lead to different
output.

Part of this is because I anticipate that once the interop work is done,
GitHub may transition repositories on the server to SHA-256 with SHA-1
interop for existing SHA-1 repositories.  People are still going to have
a fit if tarball data breaks at some point because the repository owner
decided to flip the default hash algorithm, and I'm specifically
proposing a format that is not going to direct hordes of angry users in
my direction or the repository owner's in that case.  Lots of people are
going to avoid switching the default hash algorithm if it breaks
tarballs, and I specifically don't want to encourage people sticking
with SHA-1 for that reason.

> This is the nth reference to "the standard". I think this would be
> improved by linking to it, isn't it
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html ?

Yeah, I'll do that.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux