On Mon, Jan 18, 2010 at 10:05:19AM -0800, Linus Torvalds wrote: > On my machine (4 cores with HT, so 8 threads total), I get the > following: > > [ Note: the --no-ext-grep is because I'm comparing with a git that has > the original grep optimization, but hasn't removed the external grep > functionality yet. I just used the same command line when then comparing > to next+your patch, so don't mind it. > > Also, the three examples are chosen to be: "trivial single match", > "regex single match", and "trivial lots of matches" ] You may be testing slightly different things: next has 885d211 (grep: rip out pessimization to use fixmatch()) and bbc09c2 (grep: rip out support for external grep) was added before. So next+patch uses regexec and the version with support for external greps uses strstr. IIRC strstr was a bit faster than regexec on your box. However, this only explains a small part of the results you are seeing. > > Before (all best-of-five), with the non-threaded internal grep: > > - git grep --no-ext-grep qwerty > > real 0m0.311s > user 0m0.164s > sys 0m0.144s > > - git grep --no-ext-grep qwerty.*as > > real 0m0.555s > user 0m0.416s > sys 0m0.132s > > - git grep --no-ext-grep function > > real 0m0.461s > user 0m0.276s > sys 0m0.180s > > After, with threading: > > - git grep --no-ext-grep qwerty > > real 0m0.152s > user 0m0.788s > sys 0m0.212s > > - git grep --no-ext-grep qwerty.*as > > real 0m0.148s > user 0m0.724s > sys 0m0.284s > > - git grep --no-ext-grep function > > real 0m0.241s > user 0m1.436s > sys 0m0.216s > > so now it's a clear win in all cases. And as you'd expect, the "complex > case with single output" is the one that wins most, since it's the one > that should have the most CPU usage, with the least synchronization. > > That said, it still does waste quite a bit of time doing this, and while > it almost doubles performance in the last case too, it does so at the cost > of quite a bit of overhead. > > (Some overhead is natural, especially since I have HT. Running two threads > on the same core does _not_ give double the performance, so the CPU time > almost doubling - because the threads themselves run slower - is not > unexpected. It's when the CPU time more than quadruples that I suspect > that it could be improved a bit). I haven't seen this overhead during my tests. But I'm _guessing_ that the problem is that the mutex grep_lock is held while the result is written to stdout. So when we are writing, no significant amount of work can be done by any thread. Here is a patch to fix this (applies on to of v3). diff --git a/builtin-grep.c b/builtin-grep.c index 8fca7a6..422b0ec 100644 --- a/builtin-grep.c +++ b/builtin-grep.c @@ -139,21 +139,48 @@ static int grep_file_async(struct grep_opt *opt, char *name, return 0; } +/* Mark the work_item as done. The function guarantees that w->done is + * set to 1 and that if w is included in a prefix of the range + * [todo_done, todo_start) where each work_item has .done == 1, then + * w->out will eventually be written to stdout. The writing is done + * either by this thread or by the thread which has set 'writing' to + * 1. + */ static void work_done(struct work_item *w) { - int old_done; + /* 1 if there is a thread which is writing data to stdout and + 0 otherwise. */ + static int writing = 0; + int old_done, write_idx, write_to; pthread_mutex_lock(&grep_lock); w->done = 1; old_done = todo_done; - for(; todo[todo_done].done && todo_done != todo_start; - todo_done = (todo_done+1) % ARRAY_SIZE(todo)) { - w = &todo[todo_done]; - write_or_die(1, w->out.buf, w->out.len); - if (w->type == WORK_BUF) - free(w->buf); - - free(w->name); + + /* We need a loop here if todo_start is changed while we write + * some data. */ + while (!writing && todo[todo_done].done && todo_done != todo_start) { + write_idx = todo_done; + write_to = todo_start; + writing = 1; + + /* Unlock the mutex while we write the data. This is + * safe as writing == 1 and hence there is at most one + * thread which writes data at any time. */ + pthread_mutex_unlock(&grep_lock); + for(; todo[write_idx].done && write_idx != write_to; + write_idx = (write_idx+1) % ARRAY_SIZE(todo)) { + w = &todo[write_idx]; + write_or_die(1, w->out.buf, w->out.len); + if (w->type == WORK_BUF) + free(w->buf); + + free(w->name); + } + + pthread_mutex_lock(&grep_lock); + writing = 0; + todo_done = write_idx; } if (old_done != todo_done) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html