On Wed, Mar 6, 2019 at 7:17 AM Duy Nguyen <pclouds@xxxxxxxxx> wrote: > > On Wed, Mar 6, 2019 at 6:47 AM Matheus Tavares Bernardino > <matheus.bernardino@xxxxxx> wrote: > > > > This exercise of estimating a good spot to gain performance with > > parallelism at git seems more difficult than I thought, firstly. Also, > > I'm not that familiar yet with git packing (neither with the sections > > of it that could benefit from parallelism). So could anyone point me > > some good references on this, where I could study and maybe come back > > with more valuable suggestions? > > I think you should skim through > Documentation/technical/pack-format.txt to have an idea what we're > talking about here (inflation, delta cache...). Thanks for the recommendation. It was a very explanatory reading. > The short (and slightly inaccurate) version is, in order to give the > content of a SHA-1, we need to > > 1. locate that "object" in the pack file (I'm ignoring loose objects) > > 2. assume (worst case) that this object is a delta object, it contains > only modification of another object, so you would need to get the > content of the other object first, then apply these modification ("the > delta") to get the content. This could repeat many times > > 3. once you get full content, it's actually zlib compressed, so you > need to uncompress/inflate it. > > Step 2 would be super slow if you have to uncompress that "another > object" every single time. So there's a "delta cache" which stores the > content of these "other objects". Next time you see a delta object on > top of one of these in the cache, you can just apply the delta and be > done with it. Much faster. This delta cache is of course global and > not thread safe. > > Step 3 can also be slow when dealing with large blobs. Thanks for the explanation. > Another global state that Jeff mentions is pack windows I think is a > bit harder to explain quickly. But basically we have many "windows" to > see the raw content of a pack file, these windows are global (per pack > actually) and are also not thread safe. > > So all these steps are not thread safe. When a command with multiple > thread support accesses the pack, all those steps are protected by a > single mutex (it's grep_mutex in builtin/grep.c or read_lock() in > builtin/pack-objects.c). As you can see steps here are CPU-bound (step > 3 is obvious, step 2 will have to inflate the other objects), so if > you use a more fine-grained mutex, chances are the inflation step can > be done in parallel. > > I think the good spots to gain performance are the commands that > already have multiple thread support. I mentioned git-grep and > git-pack-objects.c above. git-index-pack is the third one but it's > special and I think does not use general pack access code. > > I think if we could somehow measure lock contention of those big locks > above we can guesstimate how much gain there is. If threads in > git-grep for example often have to wait for grep_mutex, then in the > best case scenario when you make pack access thread safe, lock > contention goes down to near zero. I've been thinking on how I could implement a test to estimate the lock contention but had no success until now. I wanted to try mutrace[2] but couldn't install it; I tried valgrind's drd but it didn't seem to report a contention time estimation; And I tried measuring the time of "pthread_mutex_lock(&grep_mutex)" but I don't know how much significative this value is, since we can't directly compare it (the time of many threads at lock) with the overall execution time. Do you have an idea on how we could measure the lock contention here? Another thing that is occurring to me right now is whether git-grep, as it is implemented today, would really benefit from thread-safe pack access. I may have to study the code more, but it seems to me that just the producer thread uses pack access. [2] http://0pointer.de/blog/projects/mutrace.html > > On Tue, Mar 5, 2019 at 9:57 AM Duy Nguyen <pclouds@xxxxxxxxx> wrote: > > > > > > On Tue, Mar 5, 2019 at 11:51 AM Jeff King <peff@xxxxxxxx> wrote: > > > > > processing power from multiple cores, but about _not_ blocking. I > > > > > think one example use case here is parallel checkout. While one thread > > > > > is blocked by pack access code for whatever reason, the others can > > > > > still continue doing other stuff (e.g. write the checked out file to > > > > > disk) or even access the pack again to check more things out. > > > > > > > > Hmm, you mean distributing the process of inflating, reconstructing > > deltas and checking out files between the threads? (having each one > > doing the process for a different file?) > > Yes. So if one thread hits a giant file (and spends lot of time > inflating), the other threads can still go on inflate smaller files > and writing them to disk. > > > > > I'm not sure if it would help much for packs, because they're organized > > > > to have pretty good cold-cache read-ahead behavior. But who knows until > > > > we measure it. > > > > > > > > I do suspect that inflating (and delta reconstruction) done in parallel > > > > could be a win for git-grep, especially if you have a really simple > > > > regex that is quick to search. > > > > > > Maybe git-blame too. But this is based purely on me watching CPU > > > utilization of one command with hot cache. For git-blame though, diff > > > code as to be thread safe too but that's another story. > > > > I don't know if this relates to parallelizing pack access, but I > > thought that sharing this with you all could perhaps bring some new > > insights (maybe even on parallelizing some other git section): I asked > > my friends who contribute to the Linux Kernel what git commands seems > > to take longer during their kernel work, and the answers were: > > - git log and git status, sometimes > > - using pager's search at git log > > - checking out to an old commit > > - git log --oneline --decorate --graph > > Sometimes the slowness is not because of serialized pack access. I'm > sure git-status is not that much impacted by slow pack access. > > The pager search case may be sped up (git-log has to look at lots of > blobs and trees) but you also need to parallelize diff code as well > and that I think is way too big to consider. > > The checking out old commit, I think could also be sped up with > parallel checkout. Again this is really out of scope. But of course it > can't be done until pack access is thread safe. Or can be done but not > as pretty [1]. [1] suggests 30% speedup. That's the share-nothing best > possible case, I think. But that code is no way optimized so real > speedup could be higher. > > [1] https://public-inbox.org/git/20160415095139.GA3985@lanh/ So, although it would be out of scope for GSoC, checkout, diff and log (and maybe others) could all benefit from a thread-safe/parallel pack access, right? If so, it is very motivating the impact this project could, in theory, have :) > -- > Duy