Re: git-index-pack really does suck..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 3 Apr 2007, Linus Torvalds wrote:

> 
> 
> On Tue, 3 Apr 2007, Chris Lee wrote:
> > 
> > There's another issue here.
> > 
> > I'm running git-index-pack as part of a workflow like so:
> > 
> > $ git-verify-pack -v .git/objects/pack/*.idx > /tmp/all-objects
> > $ grep 'blob' /tmp/all-objects > /tmp/blob-objects
> > $ cat /tmp/blob-objects | awk '{print $1;}' | git-pack-objects
> > --delta-base-offset --all-progress --stdout > blob.pack
> > $ git-index-pack -v blob.pack
> > 
> > Now, when I run 'git-index-pack' on blob.pack in the current
> > directory, memory usage is pretty horrific (even with the applied
> > patch to not leak all everything). Shawn tells me that index-pack
> > should only be decompressing the object twice - once from the repo and
> > once from blob.pack - iff I call git-index-pack with --stdin, which I
> > am not.
> > 
> > If I move the blob.pack into /tmp, and run git-index-pack on it there,
> > it completes much faster and the memory usage never exceeds 200MB.
> > (Inside the repo, it takes up over 3G of RES according to top.)
> 
> Yeah. What happens is that inside the repo, because we do all the 
> duplicate object checks (verifying that there are no evil hash collisions) 
> even after fixing the memory leak, we end up keeping *track* of all those 
> objects.

What do you mean?

> And with a large repository, it's quite the expensive operation.
> 
> That whole "verify no SHA1 hash collision" code is really pretty damn 
> paranoid. Maybe we shouldn't have it enabled by default.

Maybe we shouldn't run index-pack on packs for which we _already_ have 
an index for which is the most likely reason for the collision check to 
trigger in the first place.

This is in the same category as trying to run unpack-objects on a pack 
within a repository and wondering why it doesn't work.

> So how about this updated patch? We could certainly make "git pull" imply 
> "--paranoid" if we want to, but even that is likely pretty unnecessary. 

I'm of the opinion that this patch is unnecessary.  It only helps in 
bogus workflows to start with, and it makes the default behavior unsafe 
(unsafe from a paranoid pov, but still).  And in the _normal_ workflow 
it should never trigger.

So I wouldn't merge it.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]