[PATCH] sha1_file.c: zlib can only process 4GB at a time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size.

Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---

 * There are a lot more places than just this call site, but I wanted to
   send this out to get a quick sanity check by other people if the
   approach of fixing these issues is sane.

   Of course, we should be using streaming for more codepath, but that is
   totally a separate issue. We would want our code to work correctly when
   streaming is not used and your machine is beefy enough with zetabytes
   of ram, and that is the topic of this patch.

 cache.h     |   11 +++++++++++
 sha1_file.c |   11 +++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index e11cf6a..b917ae5 100644
--- a/cache.h
+++ b/cache.h
@@ -24,6 +24,17 @@ void git_inflate_init(z_streamp strm);
 void git_inflate_end(z_streamp strm);
 int git_inflate(z_streamp strm, int flush);
 
+/*
+ * avail_in and avail_out are counted in uInt, which typically limits
+ * the size of the buffer we can use to 4GB when interacting with zlib
+ * in a single call to inflate/deflate.
+ */
+#define ZLIB_BUF_MAX ((1UL << ((sizeof(uInt) * 8))) - 1)
+static inline uInt zlib_buf_cap(unsigned long len)
+{
+	return (ZLIB_BUF_MAX < len ? ZLIB_BUF_MAX : len);
+}
+
 #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
 #define DTYPE(de)	((de)->d_type)
 #else
diff --git a/sha1_file.c b/sha1_file.c
index 064a330..51236ab 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2429,6 +2429,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	unsigned char parano_sha1[20];
 	char *filename;
 	static char tmpfile[PATH_MAX];
+	unsigned long bytes_to_deflate;
 
 	filename = sha1_file_name(sha1);
 	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
@@ -2454,14 +2455,20 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	git_SHA1_Update(&c, hdr, hdrlen);
 
 	/* Then the data itself.. */
+	bytes_to_deflate = len;
 	stream.next_in = (void *)buf;
-	stream.avail_in = len;
+	stream.avail_in = zlib_buf_cap(bytes_to_deflate);
 	do {
 		unsigned char *in0 = stream.next_in;
+		size_t consumed;
+
 		ret = deflate(&stream, Z_FINISH);
-		git_SHA1_Update(&c, in0, stream.next_in - in0);
+		consumed = stream.next_in - in0;
+		git_SHA1_Update(&c, in0, consumed);
 		if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
 			die("unable to write sha1 file");
+		bytes_to_deflate -= consumed;
+		stream.avail_in = zlib_buf_cap(bytes_to_deflate);
 		stream.next_out = compressed;
 		stream.avail_out = sizeof(compressed);
 	} while (ret == Z_OK);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]