[PATCH v3 4/5] archive-tar: use OS_CODE 3 (Unix) for internal gzip

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



gzip(1) encodes the OS it runs on in the 10th byte of its output. It
uses the following OS_CODE values according to its tailor.h [1]:

        0 - MS-DOS
        3 - UNIX
        5 - Atari ST
        6 - OS/2
       10 - TOPS-20
       11 - Windows NT

The gzip.exe that comes with Git for Windows uses OS_CODE 3 for some
reason, so this value is used on practically all supported platforms
when generating tgz archives using gzip(1).

Zlib uses a bigger set of values according to its zutil.h [2], aligned
with section 4.4.2 of the ZIP specification, APPNOTE.txt [3]:

         0 - MS-DOS
         1 - Amiga
         3 - UNIX
         4 - VM/CMS
         5 - Atari ST
         6 - OS/2
         7 - Macintosh
         8 - Z-System
        10 - Windows NT
        11 - MVS (OS/390 - Z/OS)
        13 - Acorn Risc
        16 - BeOS
        18 - OS/400
        19 - OS X (Darwin)

Thus the internal gzip implementation in archive-tar.c sets different
OS_CODE header values on major platforms Windows and macOS.  Git for
Windows uses its own zlib-based variant since v2.20.1 by default and
thus embeds OS_CODE 10 in tgz archives.

The tar archive for a commit is generated consistently on all systems
(by the same Git version).  The OS_CODE in the gzip header does not
influence extraction.  Avoid leaking OS information and make tgz
archives constistent and reproducable (with the same Git and libz
versions) by using OS_CODE 3 everywhere.

NB: The function deflateSetHeader() was introduced by zlib 1.2.2.1,
released 2004-10-31.

At least on macOS 12.4 this produces the same output as gzip(1) for the
examples I tried:

   # before
   $ git -c tar.tgz.command='git archive gzip' archive --format=tgz v2.36.0 | shasum
   3abbffb40b7c63cf9b7d91afc682f11682f80759  -

   # with this patch
   $ git -c tar.tgz.command='git archive gzip' archive --format=tgz v2.36.0 | shasum
   dc6dc6ba9636d522799085d0d77ab6a110bcc141  -

   $ git archive --format=tar v2.36.0 | gzip -cn | shasum
   dc6dc6ba9636d522799085d0d77ab6a110bcc141  -

[1] https://git.savannah.gnu.org/cgit/gzip.git/tree/tailor.h
[2] https://github.com/madler/zlib/blob/master/zutil.h
[3] https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Signed-off-by: René Scharfe <l.s.r@xxxxxx>
---
Perhaps makes sense for remote-curl as well (out of scope of this
series)?

 archive-tar.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/archive-tar.c b/archive-tar.c
index 53d0ef685c..bf7e321e0e 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -460,6 +460,14 @@ static void tgz_write_block(const void *data)

 static const char internal_gzip_command[] = "git archive gzip";

+static void tgz_set_os(git_zstream *strm, int os)
+{
+#if ZLIB_VERNUM >= 0x1221
+	struct gz_header_s gzhead = { .os = os };
+	deflateSetHeader(&strm->z, &gzhead);
+#endif
+}
+
 static int write_tar_filter_archive(const struct archiver *ar,
 				    struct archiver_args *args)
 {
@@ -473,6 +481,7 @@ static int write_tar_filter_archive(const struct archiver *ar,
 	if (!strcmp(ar->filter_command, internal_gzip_command)) {
 		write_block = tgz_write_block;
 		git_deflate_init_gzip(&gzstream, args->compression_level);
+		tgz_set_os(&gzstream, 3); /* Unix, for reproducibility */
 		gzstream.next_out = outbuf;
 		gzstream.avail_out = sizeof(outbuf);

--
2.36.1




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux