[PATCH 0/3] On compresing large index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was thinking whether compressing index might help when it contained
~2M files. It turns out that only makes the situation worse. Anyway, I
post the code and some numbers here.

The index is created artifically with the program [1]

$ git init
$ touch foo
$ git hash-object -w foo
$ ./a.out 256 256 32 | git update-index --index-info

That gives ~2M files in index, 209 MB in size.

$ time ~/w/git/git ls-files | head >/dev/null
real    0m4.635s
user    0m4.258s
sys     0m0.329s

$ time ~/w/git/git update-index level-0-0000/foo
real    0m4.593s
user    0m4.264s
sys     0m0.323s

Index is compressed with GIT_ZCACHE=1.

$ GIT_ZCACHE=1 ~/w/git/git update-index level-0-0000/foo

which gives 6.8 MB index (the true number may be less impressive
because compressing rate in my artificial tree is really high). The
only problem with this is git uses more time, not less

$ time ~/w/git/git ls-files | head >/dev/null
real    0m4.970s
user    0m4.675s
sys     0m0.289s

$ time GIT_ZCACHE=1 ~/w/git/git update-index level-0-0000/foo
real    0m4.959s
user    0m4.682s
sys     0m0.273s

My guess is Linux caches the whole index in memory already so I/O time
does not really matter, while we still have to pay for zlib's time. We
need to figure out what git uses 4s user time for.

This series may be useful on OSes that do not cache heavily. Though
I'm not sure if there is any out there nowadays.

Nguyễn Thái Ngọc Duy (3):
  read-cache: factor out cache entries reading code
  read-cache: reduce malloc/free during writing index
  Support compressing index when GIT_ZCACHE=1

 cache.h      |    1 +
 read-cache.c |  172 +++++++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 148 insertions(+), 25 deletions(-)

[1]
-- 8< --
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
	const char *prefix = "100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0\t";
	int l1, l2, l3;
	int m1, m2, m3;

	m1 = atoi(argv[1]);
	m2 = atoi(argv[2]);
	m3 = atoi(argv[3]);

	for (l1 = 0; l1 < m1; l1++) {
		printf("%slevel-0-%04d/foo\n", prefix, l1);
		for (l2 = 0; l2 < m2; l2++)
			for (l3 = 0; l3 < m3; l3++)
				printf("%slevel-0-%04d/level-1-%04d/foo-%04d\n",
				       prefix, l1, l2, l3);
	}
	return 0;
}
-- 8< --
-- 
1.7.8.36.g69ee2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]