Re: Questions on GSoC 2019 Ideas

Jeff King <peff@xxxxxxxx> · Mon, 4 Mar 2019 23:51:40 -0500

On Sun, Mar 03, 2019 at 05:12:59PM +0700, Duy Nguyen wrote:

> On Sun, Mar 3, 2019 at 2:18 PM Christian Couder
> <christian.couder@xxxxxxxxx> wrote:
> > One thing I am still worried about is if we are sure that adding
> > parallelism is likely to get us a significant performance improvement
> > or not. If the performance of this code is bounded by disk or memory
> > access, then adding parallelism might not bring any benefit. (It could
> > perhaps decrease performance if memory locality gets worse.) So I'd
> > like some confirmation either by running some tests or by experienced
> > Git developers that it is likely to be a win.
> 
> This is a good point. My guess is the pack access consists of two
> parts: deflate zlib, resolve delta objects (which is just another form
> of compression) and actual I/O. The former is CPU bound and may take
> advantage of multiple cores. However, the cache we have kinda helps
> reduce CPU work load already, so perhaps the actual gain is not that
> much (or maybe we could just improve this cache to be more efficient).
> I'm adding Jeff, maybe he has done some experiments on parallel pack
> access, who knows.

Sorry, I don't have anything intelligent to add here. I do know that
`index-pack` doesn't scale well with more cores. I don't think I've ever
looked at adding parallel access to the packs themselves. I suspect it
would be tricky due to a few global variables (the pack windows, the
delta cache, etc).

> The second good thing from parallel pack access is not about utilizing
> processing power from multiple cores, but about _not_ blocking. I
> think one example use case here is parallel checkout. While one thread
> is blocked by pack access code for whatever reason, the others can
> still continue doing other stuff (e.g. write the checked out file to
> disk) or even access the pack again to check more things out.

I'm not sure if it would help much for packs, because they're organized
to have pretty good cold-cache read-ahead behavior. But who knows until
we measure it.

I do suspect that inflating (and delta reconstruction) done in parallel
could be a win for git-grep, especially if you have a really simple
regex that is quick to search.

-Peff