On Fri, Feb 20, 2015 at 7:09 PM, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: >>> But actually most of "git fetch" is spent in the reachability check >>> subsequently done by "git-rev-list" which takes several seconds. I >> >> I wonder if reachability bitmap could help here.. > > I could have sworn I had that enabled already but evidently not. I did > test it and it cut down on clone times a bit. Now our daily repacking > is: > > git --git-dir={} gc && > git --git-dir={} pack-refs --all --prune && > git --git-dir={} repack -Ad --window=250 --depth=100 > --write-bitmap-index --pack-kept-objects && > > It's not clear to me from the documentation whether this should just > be enabled on the server, or the clients too. In any case I've enabled > it on both. Pack bitmaps matter most on the server side. What I was not sure was whether it helped the client side as well because you do rev-list on the client side for reachability test. But thinking again, I don't think enabling pack bitmaps on the client side helps much. The "--not --all" part in rev-list basically just traverses commits, not trees and objects (where pack bitmaps shine). The big problem here is "--all" which will go examine all refs. So big ref number problem again.. > Even then with it enabled on both a "git pull" that pulls down just > one commit on one branch is 13s. Trace attached at the end of the > mail. > >>> haven't looked into it but there's got to be room for optimization >>> there, surely it only has to do reachability checks for new refs, or >>> could run in some "I trust this remote not to send me corrupt data" >>> completely mode (which would make sense within a company where you can >>> trust your main Git box). >> >> No, it's not just about trusting the server side, it's about catching >> data corruption on the wire as well. We have a trick to avoid >> reachability check in clone case, which is much more expensive than a >> fetch. Maybe we could do something further to help the fetch case _if_ >> reachability bitmaps don't help. > > Still, if that's indeed a big bottleneck what's the worst-case > scenario here? That the local repository gets hosed? The server will > still recursively validate the objects it gets sent, right? The server is under pressure to pack and send data fast so it does not validate as heavily as the client. When deltas are reused, only crc32 is verified. When deltas are generated, the server must unpack some objects for deltification, but I don't think it rehashes the content to see if it produces the same SHA-1. Single bit flips could go unnoticed.. > I wonder if a better trade-off in that case would be to skip this in > some situations and instead put something like "git fsck" in a > cronjob. Either that or be optimistic, accept the pack (i.e. git-fetch returns quickly) and validate it in the background. If the pack is indeed good, you don't have to wait until validation is done. If the pack is bad, you would know after a minute or two, hopefully you can still recover from that point. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html