Re: [PATCH] grep: do not do external grep on skip-worktree entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 11 Jan 2010, Linus Torvalds wrote:
>
> Without testing it, I can already ACK it. It looks like the 
> ObviouslyRightThing(tm) to do. But I'll run some numbers too.

Ok, some good news, some meh news, and some bad news.

The good news: the trivial numbers look good. It's noticeably faster than 
external grep for me when it does the 'fixmatch()' case, quite probably 
because fixmatch() on at least Linux/x86-64 (which is the only case I 
really care about) uses SSE to do the string ops.

So on my Nehalem:

	[torvalds@nehalem linux]$ time git grep qwerty > /dev/null 

	real	0m0.418s
	user	0m0.204s
	sys	0m0.136s

	[torvalds@nehalem linux]$ time git grep --no-ext-grep qwerty > /dev/null 

	real	0m0.309s
	user	0m0.168s
	sys	0m0.136s

and since that simple fixmatch case is the common one for me, I'm happy.

The meh news: this shows how grep is faster than regexec() due to being a 
smarter algorithm. For the non-fixed case (I used "qwerty.*as"), the 
numbers are

 - built-in:
	real	0m0.548s
	user	0m0.384s
	sys	0m0.152s

 - external:
	real	0m0.415s
	user	0m0.176s
	sys	0m0.160s

so it really is just 'strstr()' that is faster. But This is a 'meh', 
because I don't really care, and the new code is still way faster than the 
old one. And I'd be personally willing to just drop the external grep if 
this is the worst problem.

[ I worry a bit that some libc implementations of 'strstr' may suck, but I 
  wouldn't lose sleep over it. ]

The bad news is that you broke multi-line greps:

	git grep --no-ext-grep -2 qwerty.*as

results in:

	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */

when the _correct_ result is 

	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */
	drivers/char/keyboard.c-        "dfghjkl;'`\000\\zxcv"                          /* 0x20 - 0x2f */
	drivers/char/keyboard.c-        "bnm,./\000*\000 \000\201\202\203\204\205"      /* 0x30 - 0x3f */

ie it didn't do the "two lines after" thing.

That said, the external grep also gets this wrong (a different way), 
because it gets all the extra noise due to unnecessary separation lines, 
so for the external grep I actually get

	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */
	drivers/char/keyboard.c-        "dfghjkl;'`\000\\zxcv"                          /* 0x20 - 0x2f */
	drivers/char/keyboard.c-        "bnm,./\000*\000 \000\201\202\203\204\205"      /* 0x30 - 0x3f */
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--

but that's a long-standing problem, and is more "ugly" than "wrong grep 
results".

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]