Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > In fact, on the instruction-level profile, it looks like it's all spent on > four instructions: > > 3.51 : 45aec0: 4c 85 72 10 test %r14,0x10(%rdx) > 86.27 : 45aec4: 48 0f 45 ca cmovne %rdx,%rcx > : if (sline->lost_head) { > : struct lline *last_one = NULL; > : /* We cannot squash it with earlier one */ > : for (lline = sline->lost_head; > : lline; > : lline = lline->next) > 6.69 : 45aec8: 48 8b 12 mov (%rdx),%rdx > : > : /* Check to see if we can squash things */ > : if (sline->lost_head) { > : struct lline *last_one = NULL; > : /* We cannot squash it with earlier one */ > : for (lline = sline->lost_head; > 3.51 : 45aecb: 48 85 d2 test %rdx,%rdx > 0.01 : 45aece: 75 f0 jne 45aec0 <consume_line+0xc0> > > (ok, five, I included the 0.01% of the branch instruction that finishes > the loop - the way Nehalem works, it will merge those test+jne > instructions into one macro-instruction internally, which is why the > branch that is obviously part of the loop doesn't get a lot of hits) > > Just FYI, those four instructions come from inlining 'append_lost()'. It's > compiled that first for-loop into some very good assembly language, but > the problem is that this all looks very much quadratic in number of > lost_lines. The quality of the compiler thus sadly in no way helps make up > for a really really expensive algorithm. > > Any ideas? What's that cmovne? The function append_lost() is the meat of combining. When you have seen a hunk like this: @@ -l,k +m,n @@ -lost line -another lost line common line +added line We queue the lost lines in front of a surviving line (that is sline that points at "common line"). "lost line" and "another lost line" are stored in lline (lost line) and they are queued to sline->lost_head. When another diff that corresponds to this section looks like this: @@ -l,k +m1,n1 @@ -lost line common line +added line the function looks at "lost line", tries to find the identical one in the list of llines. We find one, and mark the fact that the lline element is shared among the two parents. That allows the output routine to say @@@ -l,k +m,n +m1,n1 @@@ --lost line - another lost line common line ++added line If that cmovne that eats 86.27% of the run time has something to do with the memcmp() that is repeatedly done, and if the length of lost_head list is very large, I think a possible optimization is to add a line hash value in struct lline. After feeding the first diff (forget the one with +m1,n1 example above), if the second diff looks like this: @@ -l,k +m2,n2 @@ -another lost line -lost line common line +added line we match "-another lost line" with the second element in lost_head list, and when processing the next line "-lost line", we try to avoid matching it with the first one in the list, by stupidly scanning from the front of the list. I think we should add a member to struct sline to remember the last element in lost_head list for the current parent to skip that scan. Perhaps that scan is the one that is costing, not memcmp()? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html