Re: git-index-pack really does suck..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/3/07, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

Junio, Nico,
 I think we need to do something about it.

CLee was complaining about git-index-pack on #irc with the partial KDE
repo, and while I don't have the KDE repo, I decided to investigate a bit.

Even with just the kernel repo (with a single 170MB pack-file), I can do

        git index-pack --stdin --fix-thin new.pack < .git/objects/pack/pack-*.pack

and it uses 52s of CPU-time, and on my 4GB machine it actually started
doing IO and swapping, because git-index-pack grew to 4.8GB in size. So
while I initially thought I'd want a bigger test-case to see the problem,
I sure as heck don't.

There's another issue here.

I'm running git-index-pack as part of a workflow like so:

$ git-verify-pack -v .git/objects/pack/*.idx > /tmp/all-objects
$ grep 'blob' /tmp/all-objects > /tmp/blob-objects
$ cat /tmp/blob-objects | awk '{print $1;}' | git-pack-objects
--delta-base-offset --all-progress --stdout > blob.pack
$ git-index-pack -v blob.pack

Now, when I run 'git-index-pack' on blob.pack in the current
directory, memory usage is pretty horrific (even with the applied
patch to not leak all everything). Shawn tells me that index-pack
should only be decompressing the object twice - once from the repo and
once from blob.pack - iff I call git-index-pack with --stdin, which I
am not.

If I move the blob.pack into /tmp, and run git-index-pack on it there,
it completes much faster and the memory usage never exceeds 200MB.
(Inside the repo, it takes up over 3G of RES according to top.)

By "much faster", I mean: the entire pack was indexed and completed in
17:42.40, whereas I cancelled the inside-the-repo index because after
56 minutes it was only at 46%.

(And, as far as getting this huge repo published - I have it burned to
a DVD, and I'm going to drop it in the mail to hpa today on my lunch
break, which I should be taking soon.)

-clee
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]