On 15 Aug 2006 04:33:03 -0400, linux@xxxxxxxxxxx <linux@xxxxxxxxxxx> wrote:
Just to inform this discussion, a brief description of the zlib "deflate" algorithm. A full description is in RFC1951. This is LZ77 compression, meaning that what is actually encoded is a series of instruction to either - Insert a specified byte, or - Copy a run of n bytes, starting m bytes ago.
Shawn and I did some testing overnight. A simple 4KB preload table yielded a 4% size reduction in my 850MB pack. I'm still hoping for 10% using an optimal 32K table. Paper on computing the optimal preload table: http://www.eecs.harvard.edu/~michaelm/postscripts/dcc2001a.pdf If anyone is interested in doing some research, it would be interesting to build a pack engine based on full-text search technology. For example cLucene, http://clucene.sourceforge.net/index.php/Main_Page The full-text search app contains the code for extracting the dictionary and then encoding all of the blobs using it. My prediction is that a large global dictionary based scheme will compress significantly better than zlib is capable of. -- Jon Smirl jonsmirl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html