On Friday, May 22, 2015 @ 8:12 AM Jeff King did scribble: > > In builtin/index-pack.c, replace the line "collision_test_needed = > > has_sha1_file(sha1);" with "collision_test_needed = 0;". Security is > > compromised but for this test it should be ok. Then clone again. I > > hope the new number gets down close to v1.8.4.1. > > Yeah, I think that is a good starting point. I timed a local clone > before and after 45e8a74; there is a small difference on my system > (about 5%), but it goes away with your suggestion. Tested this change on a couple of versions, first of all on the revision where things go wrong for me: ~ $ bin/git --version git version 1.8.4.1.g45e8a74.dirty ~ $ time bin/git clone https://github.com/git/git test <snip> real 0m7.105s user 0m9.566s sys 0m0.989s ~ $ time bin/git clone https://github.com/git/git /sami/test <snip> real 0m14.411s user 0m9.703s sys 0m1.374s This is more in line with what I see normally. Also tested on master: ~ $ bin/git --version git version 2.4.1.217.g6c1249c.dirty ~ $ time bin/git clone https://github.com/git/git test <snip> real 0m5.946s user 0m9.111s sys 0m1.332s ~ $ time bin/git clone https://github.com/git/git /sami/test <snip> real 0m12.344s user 0m9.187s sys 0m1.579s So similar on the latest as well. > The problem with has_sha1_file() prior to v1.8.4.2 is that it is racy > with respect to simultaneous operations; we might claim we do not have > an object, when in fact we do. As you noted, usually has_sha1_file() > returns true (i.e., we look up objects that we expect to have), and the > performance impact is minimal. > > But for code paths where _not_ having the object is normal, the impact > is much greater. So I think there are two possibilities for improving > this: > > 1. Find places where we expect the object will not exist (like the > collision_test check you pointed out) and use a > "has_sha1_file_fast" that accepts that it may very occasionally > erroneously return false. In this case it would mean potentially > skipping a collision check, but I think that is OK. That could have > security implications, but only if an attacker: > > a. has broken sha1 to generate a colliding object > > b. can manipulate the victim into repacking in a loop > > c. can manipulate the victim into fetching (or receiving a push) > simultaneously with (b) > > at which point they can try to race the repack procedure to add > their colliding object to the repository. It seems rather unlikely > (especially part a). > > 2. Make reprepare_packed_git() cheaper in the common case that nothing > has changed. It would probably be enough to stat("objects/pack"). > We know that packfiles themselves do not change; we may only add or > delete them. And since the hierarchy of objects/pack is flat, we > know that the mtime on that directory will change if any packs are > added or removed. > > Of course, we are still doing an extra stat() for each has_sha1_file > call. Whether that helps for the NFS case depends on whether stat() > is significantly cheaper than opendir/readdir/closedir. On my local > disk, the hacky patch below did seem to give me back the 5% lost by > 45e8a74 (I did it directly on master, since that old commit does > not have the stat_validity infrastructure). Also tested master with the patch provided: ~ $ bin/git --version git version 2.4.1.217.g6c1249c.dirty ~ $ time git clone https://github.com/git/git test real 0m8.263s user 0m10.550s sys 0m3.763s ~ $ time git clone https://github.com/git/git /sami/test real 1m3.286s user 0m12.149s sys 0m9.192s So the patch isn't reducing the time taken when cloning to NAS. Here are the top calls from strace % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 90.68 19.946171 46 437398 45436 futex 4.63 1.017637 22 46141 5 read 2.37 0.521161 4 141429 pread 0.73 0.161442 3 47130 9 write 0.42 0.093146 0 188645 188621 access 0.38 0.083033 26 3209 181 open 0.32 0.069587 0 188613 1146 stat 0.23 0.050855 12 4082 3925 lstat 0.11 0.023317 8 2979 1 fstat 0.04 0.009134 0 35696 3 recvfrom 0.03 0.007666 1917 4 wait4 0.02 0.004478 1 3923 madvise 0.01 0.002291 0 17858 poll 0.01 0.002155 0 17851 select Thanks for looking into this. Steve ��.n��������+%������w��{.n��������n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�