Re: git-index-pack really does suck..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, 3 Apr 2007, Linus Torvalds wrote:
> 
> So how about this updated patch? We could certainly make "git pull" imply 
> "--paranoid" if we want to, but even that is likely pretty unnecessary. 
> It's not like anybody has ever shown a SHA1 collision, and if the *local* 
> repository is corrupt (and has an object with the wrong SHA1 - that's what 
> the testsuite checks for), then it's probably good to get the valid object 
> from the remote..

Some trivial timings for indexing just the kernel pack..

Without --paranoid:

	24.61user 2.16system 0:27.04elapsed 99%CPU
	0major+14120minor pagefaults

With --paranoid:

	42.74user 3.04system 0:46.36elapsed 98%CPU
	0major+72768minor pagefaults

so it's a noticeable CPU issue, but it's even more noticeable in memory 
usage (55MB vs 284MB - pagefaults give a good way to look at how much 
memory really got allocated for the process).

All that extra memory is just for SHA1 commit ID information. 

Now, clearly the usage scenario here is a big odd (ie the case where we 
have all the objects already), so in that sense this is very much a 
worst-case situation, and you simply shouldn't *do* something like this, 
but at the same time, I'm just not convinced a very theoretical SHA1 
collision check is worth it. 

Btw, even if we don't have any of the objects, if you have tons and tons 
of objects and do a "git pull", just the *lookup* of the nonexistent 
objects will be expensive: first we won't find it in any pack, then we'll 
look at the loose objects, and then we'll look int he pack *again* due to 
the race avoidance. So looking up nonexistent objects is actually pretty 
expensive.

In fact, "--paranoid" takes one second more for me even totally outside of 
a git repository, just because we waste so much time trying to look up 
non-existent object files ;)

			Linus


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]