Re: [RFC PATCH 1/1] Document a fixed tar format for interoperability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 05 2023, brian m. carlson wrote:

> +The goals for this format are that it is first and foremost reproducible, that
> +identical trees produce identical results, that it is simple and easy to
> +implement correctly, and that it is useful in general.  While we don't consider
> +functionality needs beyond Git's at the moment (such as hardlinks, xattrs, or
> +sparse files), there is intense interest in reproducible builds, and so it makes
> +sense to design something that can see general use for software interchange.

I think a goal should be to be bit-for-bit compatible with what we've
had historically, which...

> +Object IDs are not included in this version of the format because this produces
> +non-identical data when identical data is serialized with different hash
> +algorithms.

...this is inherntly at odds with. I had a longer comment about why I
think we can have our cake & eat it too at
https://lore.kernel.org/git/230131.86tu06rkbp.gmgdl@xxxxxxxxxxxxxxxxxxx/

Maybe there are other changes in the proposed spec that put it at odds
with such a goal, it's unclear to me if this is the only difference.

But I don't see why we need bit-for-bit compatible output between SHA-1
and SHA-256 git repos for the reasons noted in the linked-to reply, and
removing this will remove a *really useful* aspect of our tar format,
which is that you can grab an arbitrary tarball, and see what commit
it's produced from.

Even if you want to retain SHA-1 and SHA-256 interop as far as tar is
concerned, an un-discussed alternative is to just stick the SHA-1 OID
into the SHA-256 archive.

For repos that are migrated we envision having such a bi-directional
mapping anyway.

And for those that started out as SHA-256, or where we no longer care
about compatibility with old SHA-1, we can just start including the
SHA-256 OID, as all compatibility concerns have gone away when we
stopped bothering to maintain the mapping, no?

> +|===
> +| Field Name | Value
> +
> +| `name`     | the last path component if it fits; otherwise, `path.%d`
> +| `mode`     | `0640` (regular file), `0777` (symbolic link), `0750` (directory)
> +| `uid`      | `0`
> +| `gid`      | `0`
> +| `size`     | the size of the data in bytes for regular files if it fits; otherwise, `0`
> +| `mtime`    | `0` (the Epoch)
> +| `chksum`   | as specified in the standard

This is the nth reference to "the standard". I think this would be
improved by linking to it, isn't it
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html ?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux