I've been staring at reusing existing data while packing, and this occurred to me... During packing, suppose that we chose to store an object in base form, undeltified. And also suppose we have that object loose in .git/objects/??/ directory. We already have it in deflated form, but with its own header. I started wondering if we can somehow reuse this. A short object format brush-up lesson is in order here. * An undeltified object in a pack is represented like this: (1) the header is a dense variable size binary data, that encodes type and inflated length; (2) deflated data immediately follows the header. * On the other hand, a loose object is represented like this: (1) the header looks like sprintf("%s %lu%c", type, len, 0); (2) concatenate the data to the header; (3) SHA1 checksum of the above becomes the object name. (4) deflate the header and data using the same z_stream, in two steps, like this (sha1_file.c::write_sha1_file): /* Compress it */ stream.next_out = compressed; stream.avail_out = size; /* First header.. */ stream.next_in = hdr; stream.avail_in = hdrlen; while (deflate(&stream, 0) == Z_OK) /* nothing */; /* Then the data itself.. */ stream.next_in = buf; stream.avail_in = len; while (deflate(&stream, Z_FINISH) == Z_OK) /* nothing */; deflateEnd(&stream); size = stream.total_out; So I thought... if we cause a full flush after the header part, I can find the flush boundaries from a loose object file and copy the rest into a packfile I am generating, after placing the binary encoded header. If this works, we do not have to inflate loose object to read it and deflate it to store that in the pack. We will get a better packing as well, since we deflate loose objects with Z_BEST_COMPRESSION, while packs are done with Z_DEFAULT_COMPRESSION. While pack-objects read from a loose object, if we can detect that there is no full flush after the header, we would do the traditional inflate-deflate cycle, so this would be backward compatible. However, I am stuck with the first step, which is to do a full flush after the header. An obvious change to the code quoted above writes out a corrupt object: /* First header.. */ stream.next_in = hdr; stream.avail_in = hdrlen; - while (deflate(&stream, 0) == Z_OK) + while (deflate(&stream, Z_FULL_FLUSH) == Z_OK) /* nothing */; git-fsck-objects complains that sha1 does not match. It appears that the sha1_file.c::unpack_sha1_rest() somehow barfs upon seeing the full flush, but I haven't dug into it yet. Would anybody with more experience with zlib want to help? - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html