Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 02 2023, Theodore Ts'o wrote:

> On Thu, Feb 02, 2023 at 05:19:30PM -0400, Joey Hess wrote:
>> In my opinion as the original developer of pristine-tar, it's too
>> complicated to be usefully used by git. The problem it solves is of a
>> larger scope than the problem git has here. (I hope.)
>
> Well, the problem which I believe folks on this thread are trying to
> deal with is a way to reconstruct a bit-for-bit compressed tarball of
> a particular release in a way that minimizes the cost of storage in
> the git tree.  One way of doing that would be to guarantee that git
> archive would return something which is always bit-for-bit identical.
> Another way is to use something like pristine tar.

I think that's what this side-thread has devolved into, but I honestly
don't see how that's useful or more than tangentally related to the
problem noted at the start of the thread.

If you are writing a new system that consumes "git archive" output
something like what I'm proposing to add in [1] should nicely sidestep
this issue, just checksum the uncompressed archive (assuming you're OK
with our soft "tar" guarantees), or "git tag -v" (if you can) etc.

That part of the docs is just a summary of what Konstantin Ryabitsev
pointed out in a side-thread.

One might also imagine any other number of trivial solutions to the
problem, e.g. people interested in this can unpack the archive, and then
(needs to guarantee sorted order, which I think find(1) doesn't, but
just as a POC):

	(cd unpacked && find . -type f -printf "%f\n" -exec cat {} \; | sha256sum)

Or whatever.

But any such solution to the abstract problem isn't going to help the
existing users whose systems broke because they were assuming certain
things about the "git archive" output.

For those users I think (as my proposed series does) we should just do
whatever we can do limit the disruption, as my proposed [2] does by
switching back to "gzip".

For those users who are creating new systems that might use "git
archive" today we then just need to update the documentation going
forward. Maybe those could use "pristine-tar", or perhaps they can use
some entirely different distribution mechanism.

1. https://lore.kernel.org/git/patch-9.9-b40833b2168-20230202T093212Z-avarab@xxxxxxxxx/
2. https://lore.kernel.org/git/cover-0.9-00000000000-20230202T093212Z-avarab@xxxxxxxxx/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux