On Thu, Apr 4, 2019 at 4:56 AM Christian Couder <christian.couder@xxxxxxxxx> wrote: > > Hi, > > On Thu, Apr 4, 2019 at 3:15 AM Matheus Tavares Bernardino > <matheus.bernardino@xxxxxx> wrote: > > > > I've been studying the codebase and looking for older emails in the ML > > that discussed what I want to propose as my GSoC project. In > > particular, I found a thread about slow git commands on chromium, so I > > reached them out at chromium's ML to ask if it's still an issue. I got > > the following answer: > > > > On Wed, Apr 3, 2019 at 1:41 PM Erik Chen <erikchen@xxxxxxxxxxxx> wrote: > > > Yes, this is absolutely still a problem for Chrome. I filed some bugs for common operations that are slow for Chrome: git blame [1], git stash [2], git status [3] > > > On Linux, blame is the only operation that is really problematic. On macOS and Windows ... it's hard to find a git operation that isn't slow. :( > > Nice investigation. About git status I wonder though if they have > tried the possible optimizations, like untracked cache or > core.fsmonitor. I don't know if they did, but I suggested them to check core.commitGraph, pack.useBitmaps and core.untrackedCache (which Duy suggested me in another thread). > > I don't really know if treading would help stash and status, but I > > think it could help blame. By the little I've read of blame's code so > > far, my guess is that the priority queue used for the commits could be > > an interface for a producer-consumer mechanism and that way, > > assign_blame's main loop could be done in parallel. And as we can se > > at [4], that is 90% of the command's time. Does this makes sense? > > I can't really tell as I haven't studied this, but from the links in > your email I think it kind of makes sense. > > Instead of doing assign_blame()'s main loop in parallel though, if my > focus was only making git blame faster, I think I would first try to > cache xdl_hash_record() results and then if possible to compute > xdl_hash_record() in parallel as it seems to be a big bottleneck and a > quite low hanging fruit. Hm, I see. But although it would take more effort to add threading at assign_blame(), wouldn't it be better because more work could be done in parallel? I think it could be implemented in the same fashion git grep does. > > But as Duy pointed out, if I recall correctly, for git blame to be > > parallel, pack access and diff code would have to be thread-safe > > first. And also, it seems, by what we've talked earlier, that this > > much wouldn't fit all together in a single GSoC. So, would it be a > > nice GSoC proposal to try "making code used by blame thread-safe", > > targeting a future parallelism on blame to be done after GSoC? > > Yeah, I think it would be a nice proposal, even though it doesn't seem > to be the most straightforward way to make git blame faster. > > Back in 2008 when we proposed a GSoC about creating a sequencer, it > wasn't something that would easily fit in a GSoC, and in fact it > didn't, but over the long run it has been very fruitful as the > sequencer is now used by cherry-pick and rebase -i, and there are > plans to use it even more. So unless people think it's not a good idea > for some reason, which hasn't been the case yet, I am ok with a GSoC > project like this. > > > And if > > so, could you please point me out which files should I be studying to > > write the planning for this proposal? (Unfortunately I wasn't able to > > study pack access and diff code yet. I got carried on looking for > > performance hostposts and now I'm a bit behind schedule :( > > I don't think you need to study everything yet, and I think you > already did a lot of studying, so I would suggest you first try to > send soon a proposal with the information you have right now, and then > depending on the feedback you get and the time left (likely not > much!!!), you might study some parts of the code a bit more later. Thanks a lot, Christian. I'm writing my proposal and will try to send it today. > > Also, an implementation for fuzzy blame is being developer right > > now[5] and Jeff (CC-ed) suggested recently another performance > > improvement that could be done in blame[6]. So I would like to know > > wether you think it is worthy putting efforts trying to parallelize > > it. > > What you would do seems compatible to me with the fuzzy blame effort > and an effort to cache xdl_hash_record() results. > > Thanks, > Christian.