[PATCH 0/9] Prefix-compress on-disk index entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is still rough, but with this patch I am getting:

    $ ls -l .git/index*
    -rw-r----- 1 jch eng 25586488 2012-04-03 15:27 .git/index
    -rw-r----- 1 jch eng 14654328 2012-04-03 15:38 .git/index-4

in a clone of WebKit repository that has 183175 paths.

With hot-cache with no local modification:

    $ time sh -c 'GIT_INDEX_FILE=.git/index-4 git diff'
    real  0m0.469s
    user  0m0.130s
    sys   0m0.330s

    $ time sh -c 'git diff'
    real  0m0.677s
    user  0m0.290s
    sys   0m0.370s

which is mesuring the time needed to read of the index into in-core
structure and comparing the cached stat information taken from lstat(2).

The updated format is not documented yet, as I didn't intend (and I still
am not committed) to declare a change along this line the official "v4"
format; I was merely being curious to see how much improvements we can get
from a trivial approach like this.

The saving of the on-disk index size comes from two factors:

 - Not padding the on-disk index entries to 8-byte boundary;

 - Not storing the full pathname for each entry in the on-disk format.

Because the entries are sorted by path, adjacent entries in the index tend
to share the leading components of them, and it makes sense to only store
the differences in later entries.  In the v4 on-disk format of the index,
each on-disk cache entry stores the number of bytes to be stripped from
the end of the previous name, and the bytes to append to the result, to
come up with its name.

The "to-remove" count is encoded in the varint format used in the
packfiles, and the "bytes-to-append" is a simple NUL-terminated string.

Junio C Hamano (9):
  varint: make it available outside the context of pack
  cache.h: hide on-disk index details
  read-cache.c: allow unaligned mapping of the index file
  read-cache.c: make create_from_disk() report number of bytes it consumed
  read-cache.c: report the header version we do not understand
  read-cache.c: move code to copy ondisk to incore cache to a helper function
  read-cache.c: move code to copy incore to ondisk cache to a helper function
  read-cache.c: read prefix-compressed names in index on-disk version v4
  read-cache.c: write index v4 format

 Makefile               |    2 +
 builtin/update-index.c |    2 +
 cache.h                |   52 +---------
 config.c               |   11 ++
 environment.c          |    1 +
 read-cache.c           |  259 ++++++++++++++++++++++++++++++++++++++++--------
 varint.c               |   29 ++++++
 varint.h               |    9 ++
 8 files changed, 275 insertions(+), 90 deletions(-)
 create mode 100644 varint.c
 create mode 100644 varint.h

-- 
1.7.10.rc4.54.g1d5dd3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]