On Mon, 11 Jan 2010, Linus Torvalds wrote: > > Without testing it, I can already ACK it. It looks like the > ObviouslyRightThing(tm) to do. But I'll run some numbers too. Ok, some good news, some meh news, and some bad news. The good news: the trivial numbers look good. It's noticeably faster than external grep for me when it does the 'fixmatch()' case, quite probably because fixmatch() on at least Linux/x86-64 (which is the only case I really care about) uses SSE to do the string ops. So on my Nehalem: [torvalds@nehalem linux]$ time git grep qwerty > /dev/null real 0m0.418s user 0m0.204s sys 0m0.136s [torvalds@nehalem linux]$ time git grep --no-ext-grep qwerty > /dev/null real 0m0.309s user 0m0.168s sys 0m0.136s and since that simple fixmatch case is the common one for me, I'm happy. The meh news: this shows how grep is faster than regexec() due to being a smarter algorithm. For the non-fixed case (I used "qwerty.*as"), the numbers are - built-in: real 0m0.548s user 0m0.384s sys 0m0.152s - external: real 0m0.415s user 0m0.176s sys 0m0.160s so it really is just 'strstr()' that is faster. But This is a 'meh', because I don't really care, and the new code is still way faster than the old one. And I'd be personally willing to just drop the external grep if this is the worst problem. [ I worry a bit that some libc implementations of 'strstr' may suck, but I wouldn't lose sleep over it. ] The bad news is that you broke multi-line greps: git grep --no-ext-grep -2 qwerty.*as results in: drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] = drivers/char/keyboard.c- "\000\0331234567890-=\177\t" /* 0x00 - 0x0f */ drivers/char/keyboard.c: "qwertyuiop[]\r\000as" /* 0x10 - 0x1f */ when the _correct_ result is drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] = drivers/char/keyboard.c- "\000\0331234567890-=\177\t" /* 0x00 - 0x0f */ drivers/char/keyboard.c: "qwertyuiop[]\r\000as" /* 0x10 - 0x1f */ drivers/char/keyboard.c- "dfghjkl;'`\000\\zxcv" /* 0x20 - 0x2f */ drivers/char/keyboard.c- "bnm,./\000*\000 \000\201\202\203\204\205" /* 0x30 - 0x3f */ ie it didn't do the "two lines after" thing. That said, the external grep also gets this wrong (a different way), because it gets all the extra noise due to unnecessary separation lines, so for the external grep I actually get -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] = drivers/char/keyboard.c- "\000\0331234567890-=\177\t" /* 0x00 - 0x0f */ drivers/char/keyboard.c: "qwertyuiop[]\r\000as" /* 0x10 - 0x1f */ drivers/char/keyboard.c- "dfghjkl;'`\000\\zxcv" /* 0x20 - 0x2f */ drivers/char/keyboard.c- "bnm,./\000*\000 \000\201\202\203\204\205" /* 0x30 - 0x3f */ -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- but that's a long-standing problem, and is more "ugly" than "wrong grep results". Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html