[RFH] zlib gurus out there?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've been staring at reusing existing data while packing, and
this occurred to me...

During packing, suppose that we chose to store an object in
base form, undeltified.  And also suppose we have that object
loose in .git/objects/??/ directory.  We already have it in
deflated form, but with its own header.  I started wondering if
we can somehow reuse this.

A short object format brush-up lesson is in order here.  

* An undeltified object in a pack is represented like this:

 (1) the header is a dense variable size binary data, that
     encodes type and inflated length;
 (2) deflated data immediately follows the header.

* On the other hand, a loose object is represented like this:

 (1) the header looks like sprintf("%s %lu%c", type, len, 0);
 (2) concatenate the data to the header;
 (3) SHA1 checksum of the above becomes the object name.
 (4) deflate the header and data using the same z_stream, in two
     steps, like this (sha1_file.c::write_sha1_file):

	/* Compress it */
	stream.next_out = compressed;
	stream.avail_out = size;

	/* First header.. */
	stream.next_in = hdr;
	stream.avail_in = hdrlen;
	while (deflate(&stream, 0) == Z_OK)
		/* nothing */;

	/* Then the data itself.. */
	stream.next_in = buf;
	stream.avail_in = len;
	while (deflate(&stream, Z_FINISH) == Z_OK)
		/* nothing */;
	deflateEnd(&stream);
	size = stream.total_out;

So I thought... if we cause a full flush after the header part,
I can find the flush boundaries from a loose object file and
copy the rest into a packfile I am generating, after placing the
binary encoded header.  If this works, we do not have to inflate
loose object to read it and deflate it to store that in the
pack.  We will get a better packing as well, since we deflate
loose objects with Z_BEST_COMPRESSION, while packs are done with
Z_DEFAULT_COMPRESSION.  While pack-objects read from a loose
object, if we can detect that there is no full flush after the
header, we would do the traditional inflate-deflate cycle, so
this would be backward compatible.

However, I am stuck with the first step, which is to do a full
flush after the header.  An obvious change to the code quoted
above writes out a corrupt object:

	/* First header.. */
	stream.next_in = hdr;
	stream.avail_in = hdrlen;
-	while (deflate(&stream, 0) == Z_OK)
+	while (deflate(&stream, Z_FULL_FLUSH) == Z_OK)
		/* nothing */;

git-fsck-objects complains that sha1 does not match.  It appears
that the sha1_file.c::unpack_sha1_rest() somehow barfs upon
seeing the full flush, but I haven't dug into it yet.

Would anybody with more experience with zlib want to help?

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]