Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-02-01 at 23:37:19, Junio C Hamano wrote:
> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
> 
> > I don't think a blurb is necessary, but you're basically underscoring
> > the problem, which is that nobody is willing to promise that compression
> > is consistent, but yet people want to rely on that fact.  I'm willing to
> > write and implement a consistent tar spec and to guarantee compatibility
> > with that, but the tension here is that people also want gzip to never
> > change its byte format ever, which frankly seems unrealistic without
> > explicit guarantees.  Maybe the authors will agree to promise that, but
> > it seems unlikely.
> 
> Just to step back a bit, where does the distinction between
> guaranteeing the tar format stability and gzip compressed bitstream
> stability come from?  At both levels, the same thing can be
> expressed in multiple different ways, I think, but spelling out how
> exactly the compressor compresses is more involved than spelling out
> how entries in a tar archive is ordered and each entry is expressed,
> or something?

Yes, at least with my understanding about how gzip and compression in
general work.

The tar format (and the pax format which builds on it) can mostly be
restricted by explaining what data is to be included in the pax and tar
headers and how it is to be formatted.  If we say, we will always write
such and such information in the pax header and sort the keys, and we
write such and such information in the tar header, then the format is
completely deterministic, and we can make nice guarantees.

My understanding about how Lempel-Ziv-based compression algorithms work
is that there's a lot more freedom to decide how best to compress things
and that there isn't always a logical obvious choice, but I will admit
my understanding is relatively limited.  If someone thinks we can
effectively succeed in supporting compression more than just relying on
gzip, I would be delighted to be shown to be wrong.

> > That would probably break things, because gzip is GPLv3, and we'd need
> > to ship a much older GPLv2 gzip, which would probably differ from the
> > current behaviour, and might also have some security problems.
> 
> Yup, security issues may make bit-for-bit-stability unrealistic.
> IIRC, the last time we had discussion on this topic, we settled
> on stability across the same version of Git (i.e. deterministic
> result)?

Yes, I think that's what we agreed.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux