Hi Junio, I seem to have completely missed the earlier series at http://thread.gmane.org/gmane.comp.version-control.git/194660 My bad. Thomas has been working on a prototype converter over the past few days, with results similar to (but not quite as good as) your numbers $ ls -l .git/index* -rw-r----- 1 jch eng 25586488 2012-04-03 15:27 .git/index -rw-r----- 1 jch eng 14654328 2012-04-03 15:38 .git/index-4 while taking a different approach with different tradeoffs. Nevertheless... > + (Version 4) In version 4, the entry path name is prefix-compressed > + relative to the path name for the previous entry (the very first > + entry is encoded as if the path name for the previous entry is an > + empty string). At the beginning of an entry, an integer N in the > + variable width encoding (the same encoding as the offset is encoded > + for OFS_DELTA pack entries; see pack-format.txt) is stored, followed > + by a NUL-terminated string S. Removing N bytes from the end of the > + path name for the previous entry, and replacing it with the string S > + yields the path name for this entry. [..] > + (Version 4) In version 4, the padding after the pathname does not > + exist. I think there are actually several separate ideas here: * The prefix compression. Thomas is not using this idea; we've been toying with making the index bisectable (within each directory) for fast single-entry lookups, which inherently conflicts with this. The directory-like layout partially achieves the same (elides common path components). * The varint encoding (or offset encoding, but "varint" is something you can google :-). David suggested using it on stat() data, combined with zigzag encoding and delta against the first entry in the directory, which gives some good compression results. Profiling will have to say whether the extra decoding effort is worth the space savings. * The lack of variable padding, which is a good idea -- in any case I seem to remember Shawn complaining about it. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html