Re: git --archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022-09-22 at 20:35:08, Scheffenegger, Richard wrote:
> Also, at least for ZIP (not so much for TAR), objects residing in
> different subdirectories can be stored in any order - and only need to
> be referenced properly in the central directory. Thus whenever a
> subthread has completed the reading of a (sufficiently small) object
> to be in (git program) memory, it should be sent immediately to the
> ZIP writer thread. The result would be that small and hot files (which
> can be read in quickly) end up at the beginning of the zip file, but
> the parallel threads can already, in parallel, read-in larger and
> colder object - the absolute wait time within the worker thread
> reading those objects may be slightly higher, but as many objects are
> read in in parallel, the absolute time to create the archive would be
> minimized.

Maybe they can technically be stored in any order, but people don't want
git archive to produce non-deterministic archives.  I'm one of the folks
responsible for the service at GitHub that serves archives (which uses
git archive under the hood) and people become very unhappy when the
archives are not bit-for-bit identical, even though neither Git nor
GitHub guarantee that.  That's because people want to use those archives
with cryptographic hashes like SHA-256, and if the file changes, the
hash breaks.  (We tell them to generate a tarball as part of the release
process and upload it as a release asset instead.)

What Git does implicitly guarantee is that the result is deterministic:
that is, given the same repository and the same version of Git, that the
archive is identical.  The encoding may change across versions, but not
within a version.  I feel like it would be very difficult to achieve the
speedups you want and still produce a deterministic archive.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux