Re: Git tree object storing policy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ivan Tolstosheyev <ivan.tolstosheyev@xxxxxxxxx> writes:

> #!/usr/bin/env bash
>
> git init test
> cd test
> for i in `seq 1 10000` 
> do
> touch ${i} ; git add ${i} ; git commit -m "Add ${i}" ;
> done
> cd ..
> du -hs test
[...]
> 180 MB!!!?? and 7.4M after `git gc` - thanks to delta compression!

Most of those 180MB are waste from mostly unused 4KB (presumably) blocks
of your filesystem.  You should be looking at the post-gc'd numbers.

Let's see the breakdown of 'du -h .git':

0       .git/rr-cache
1.5M    .git/logs/refs/heads
1.5M    .git/logs/refs
2.9M    .git/logs
4.0K    .git/objects/info
2.8M    .git/objects/pack
2.8M    .git/objects
0       .git/branches
12K     .git/info
0       .git/remotes
88K     .git/hooks
0       .git/refs/tags
0       .git/refs/heads
0       .git/refs
6.5M    .git

So 2.9MB are git keeping a reflog of everything we did (on HEAD and on
master).  Since merely storing a SHA1 for each of your 10000 operations
already takes 200K, that's not so far off -- the factor of 10 is in the
email, date and log message.

In my case 704K went into the index (not directly visible above, it's
the bulk of the top level).  That's also not unreasonable: merely
storing the object SHA1 (20 bytes) and a bunch of timestamps for 10000
files also gets you into the 500K ballpark.

The pack index amazingly takes only about 500K, even though it is
indexing 10000 trees and 10000 commits, so again the SHA1s alone get you
into the 400K ballpark.

That leaves only 2.3MB for the actual pack (which contains all the
data!).  But every commit must store a tree and a parent, so there are
at least 2*10000*20 = 400K uncompressable bytes in the commits
already[*].  So we are within a factor of 6 of just the data required to
save the shape of your history DAG, no content included.  I'd say that's
not too bad.


[*] This is not quite true, the parents and trees might be pointers
within the pack.  AFAIK the proposed pack v4 format does this, and would
yield a more efficient compression.  So if you're going to waste energy
worrying about this, you should help with pack v4.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]