On Sun, Mar 03, 2019 at 05:12:59PM +0700, Duy Nguyen wrote: > On Sun, Mar 3, 2019 at 2:18 PM Christian Couder > <christian.couder@xxxxxxxxx> wrote: > > One thing I am still worried about is if we are sure that adding > > parallelism is likely to get us a significant performance improvement > > or not. If the performance of this code is bounded by disk or memory > > access, then adding parallelism might not bring any benefit. (It could > > perhaps decrease performance if memory locality gets worse.) So I'd > > like some confirmation either by running some tests or by experienced > > Git developers that it is likely to be a win. > > This is a good point. My guess is the pack access consists of two > parts: deflate zlib, resolve delta objects (which is just another form > of compression) and actual I/O. The former is CPU bound and may take > advantage of multiple cores. However, the cache we have kinda helps > reduce CPU work load already, so perhaps the actual gain is not that > much (or maybe we could just improve this cache to be more efficient). > I'm adding Jeff, maybe he has done some experiments on parallel pack > access, who knows. Sorry, I don't have anything intelligent to add here. I do know that `index-pack` doesn't scale well with more cores. I don't think I've ever looked at adding parallel access to the packs themselves. I suspect it would be tricky due to a few global variables (the pack windows, the delta cache, etc). > The second good thing from parallel pack access is not about utilizing > processing power from multiple cores, but about _not_ blocking. I > think one example use case here is parallel checkout. While one thread > is blocked by pack access code for whatever reason, the others can > still continue doing other stuff (e.g. write the checked out file to > disk) or even access the pack again to check more things out. I'm not sure if it would help much for packs, because they're organized to have pretty good cold-cache read-ahead behavior. But who knows until we measure it. I do suspect that inflating (and delta reconstruction) done in parallel could be a win for git-grep, especially if you have a really simple regex that is quick to search. -Peff