Re: Compression and dictionaries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> The zlib doc says to put your most common strings into the fixed
> dictionary. If a string isn't in the fixed dictionary it will get
> handled with an internal dictionary entry.  By default zlib runs with
> an empty fixed dictionary and handles everything with the internal
> dictionary.
 
> Since we are encoding C many strings will always be present (if,
> static, define, const, char, include, int, void, while, continue,
> etc).  Do you have any tools to identify the top 500 strings in C
> code? The fixed dictionary would get hardcoded into the git apps.

Actually GIT itself may also benefit from other strings beyond
those common found in C-like languages:

	'10644 '
	'40000 '
	'parent '
	'tree '
	'author '
	'committer '

as these occur frequently in trees and commits.
 
> A fixed dictionary could conceivably take 5-10% off the size of each entry.

Could be an interesting experiment to see if that's really true
for common loads (e.g. the kernel repo).  I don't think anyone has
tried it.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]