Re: [PATCH 5/5] archive-zip: stream large blobs into zip file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 30.04.2012 21:12, schrieb Junio C Hamano:
Nguyễn Thái Ngọc Duy<pclouds@xxxxxxxxx>  writes:

A large blob will be read twice. One for calculating crc32, one for
actual writing.

Is that because you need the checksum before the payload?  That is
unfortunate.  It would be nice (read: not a rejection of this patch---it
is a good first step to do it stupid but correct way before trying to
optimize it) to avoid it when the output is seekable, especially because
we are talking about a *large* payload.

The ZIP format optionally allows writing the CRC and the sizes after the data. This adds a data descriptor with a size of 16 bytes to each streamed entry. Seeking back and correcting these values in an output file would avoid that.

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index fe47554..458fdde 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -138,4 +138,8 @@ test_expect_success 'tar achiving' '
  	git archive --format=tar HEAD>/dev/null
  '

+test_expect_success 'zip achiving' '
+	git archive --format=zip HEAD>/dev/null
+'

Can't we do better than "we only check if it finishes without barfing; we
cannot afford to check the correctness of the output"?  The same comment
applies to all the tests you added to this file in the past 3 months.

Streaming to tar can be tested by setting core.big_file_threshold big enough, creating a non-streamed version of the archive and comparing it to the streamed one. With the seek trick, this would work for ZIP as well. For streaming with an added data descriptor we'd need to actually unpack the ZIP file, though.

René
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]