This is still rough, but with this patch I am getting: $ ls -l .git/index* -rw-r----- 1 jch eng 25586488 2012-04-03 15:27 .git/index -rw-r----- 1 jch eng 14654328 2012-04-03 15:38 .git/index-4 in a clone of WebKit repository that has 183175 paths. With hot-cache with no local modification: $ time sh -c 'GIT_INDEX_FILE=.git/index-4 git diff' real 0m0.469s user 0m0.130s sys 0m0.330s $ time sh -c 'git diff' real 0m0.677s user 0m0.290s sys 0m0.370s which is mesuring the time needed to read of the index into in-core structure and comparing the cached stat information taken from lstat(2). The updated format is not documented yet, as I didn't intend (and I still am not committed) to declare a change along this line the official "v4" format; I was merely being curious to see how much improvements we can get from a trivial approach like this. The saving of the on-disk index size comes from two factors: - Not padding the on-disk index entries to 8-byte boundary; - Not storing the full pathname for each entry in the on-disk format. Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. The "to-remove" count is encoded in the varint format used in the packfiles, and the "bytes-to-append" is a simple NUL-terminated string. Junio C Hamano (9): varint: make it available outside the context of pack cache.h: hide on-disk index details read-cache.c: allow unaligned mapping of the index file read-cache.c: make create_from_disk() report number of bytes it consumed read-cache.c: report the header version we do not understand read-cache.c: move code to copy ondisk to incore cache to a helper function read-cache.c: move code to copy incore to ondisk cache to a helper function read-cache.c: read prefix-compressed names in index on-disk version v4 read-cache.c: write index v4 format Makefile | 2 + builtin/update-index.c | 2 + cache.h | 52 +--------- config.c | 11 ++ environment.c | 1 + read-cache.c | 259 ++++++++++++++++++++++++++++++++++++++++-------- varint.c | 29 ++++++ varint.h | 9 ++ 8 files changed, 275 insertions(+), 90 deletions(-) create mode 100644 varint.c create mode 100644 varint.h -- 1.7.10.rc4.54.g1d5dd3 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html