On Mon, Mar 11, 2019 at 09:18:47PM -0300, Matheus Tavares Bernardino wrote: > I've been thinking on how I could implement a test to estimate the > lock contention but had no success until now. I wanted to try > mutrace[2] but couldn't install it; I tried valgrind's drd but it > didn't seem to report a contention time estimation; And I tried > measuring the time of "pthread_mutex_lock(&grep_mutex)" but I don't > know how much significative this value is, since we can't directly > compare it (the time of many threads at lock) with the overall > execution time. Do you have an idea on how we could measure the lock > contention here? (I'm supposed to be doing something else, but this is more fun :D) Yeah lock contention is probably hard to measure. But at least we can measure how much time is blocked by mutex. Something like this [1] gives me $ time ~/w/git/temp/git grep --threads=8 abc HEAD >/dev/null warning: let's have some fun block_time = 20ms block_count = 10725 real 0m0,379s user 0m0,425s sys 0m0,073s >From this I know "git grep" took 379ms and at least 20ms (probably including measurement overhead) is wasted on pthread_mutex_lock(). It does not look that significant, I admit. > Another thing that is occurring to me right now is whether git-grep, > as it is implemented today, would really benefit from thread-safe pack > access. I may have to study the code more, but it seems to me that > just the producer thread uses pack access. That producer I think is just handing out assignments to worker threads. The real work is still done by worker threads. The entry point to pack access in this code is protected by grep_read_lock(). I believe the multithread one is in this deep call chain (I found it out by gdb) [main thread] grep_oid() -> add_work() -> [worker thread] grep_source() -> grep_source_1() -> grep_source_is_binary() -> grep_source_load() -> grep_source_load_oid() -> read_object_file() Note that there's another source of pack access, the lock_and_read_oid_file() in grep_tree(). This is where we unpack tree objects and traverse to get blob SHA-1. This code is currently on the main thread (maybe this is what you found?) so it does not really benefit from multi thread. We could still add some sort of lookahead queue to inflate tree objects in advance in parallel, but I don't know how much gain that will be. One thing I didn't notice is we currently force no threads in the case we need pack access, e.g. "git grep <regex> <commit>" or "git grep --cached <regex>". So if you need to experiment, you need to hack it and remove that restriction. That's the "let's have some fun" code in [1]. So, assuming this is CPU bottleneck, let's have a look at how CPU is used. perf record git grep --threads=1 abc HEAD >/dev/null perf report shows me the top CPU consumers are 51,16% git libz.so.1.2.11 [.] inflate_fast 19,55% git git [.] bmexec You need to do some guessing here. But I believe the top call is from inflating objects (either from packs or from loose objects). The second one is from regular expression engine. Since the regex here is very short (I deliberately try Jeff's case where object access dominates), the CPU is mostly used up for object inflation. That suggests that if we could spread it out across cores, the gain could be quite good. There aren't many dependencies in grep tasks so if we spread the workload on 8 cores, execution time should be reduced by 7 or 8 times. For fun, I hacked up a horrible, horrible "thread safe" version [2] that probably broke 99% of git and made helgrind extremely unhappy, so these numbers are at best guidelines, but.. $ time ./git grep --threads=1 abc HEAD >/dev/null real 0m0,253s user 0m0,225s sys 0m0,029s $ time ./git grep --threads=8 abc HEAD >/dev/null warning: let's have some fun! real 0m0,157s user 0m0,312s sys 0m0,089s You can see the "real" rows show quite good time reduction. Not 8 times reduction, mind you, but that's where serious people dig in and really speed it up ;-) [1] https://gitlab.com/snippets/1834609 [2] https://gitlab.com/snippets/1834613 > So, although it would be out of scope for GSoC, checkout, diff and log > (and maybe others) could all benefit from a thread-safe/parallel pack > access, right? If so, it is very motivating the impact this project > could, in theory, have :) We have to analyze case by case. It may turn out that there are many opportunity to utilize multi threads. I think checkout is definitely a good candidate. For "git diff" and "git log" maybe you can try "perf" to see how much workload is locked in pack access (mostly inflation, because it's easier to spot from the profile report) -- Duy