[PATCH 9/9] grep: pre-load userdiff drivers when threaded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The low-level grep_source code will automatically load the
userdiff driver to see whether a file is binary. However,
when we are threaded, it will load the drivers in a
non-deterministic order, handling each one as its assigned
thread happens to be scheduled.

Meanwhile, the attribute lookup code (which underlies the
userdiff driver lookup) is optimized to handle paths in
sequential order (because they tend to share the same
gitattributes files). Multi-threading the lookups destroys
the locality and makes this optimization less effective.

We can fix this by pre-loading the userdiff driver in the
main thread, before we hand off the file to a worker thread.
My best-of-five for "git grep foo" on the linux-2.6
repository went from:

  real    0m0.391s
  user    0m1.708s
  sys     0m0.584s

to:

  real    0m0.360s
  user    0m1.576s
  sys     0m0.572s

Not a huge speedup, but it's quite easy to do. The only
trick is that we shouldn't perform this optimization if "-a"
was used, in which case we won't bother checking whether
the files are binary at all.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
The speedup is especially unimpressive when you consider that it won't
grow as the grep load grows. This is a pretty fast grep. If you used a
real regex, the whole thing would take even longer, and you will still
only be shaving off a few tens of milliseconds. So I wouldn't be
heart-broken if this patch was dropped. I included it because it's easy
to do, and maybe somebody with a slower machine would find the absolute
time difference more noticeable.

 builtin/grep.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index bc85a20..9fc3e95 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -85,8 +85,8 @@ static pthread_cond_t cond_result;
 
 static int skip_first_line;
 
-static void add_work(enum grep_source_type type, const char *name,
-		     const void *id)
+static void add_work(struct grep_opt *opt, enum grep_source_type type,
+		     const char *name, const void *id)
 {
 	grep_lock();
 
@@ -95,6 +95,8 @@ static void add_work(enum grep_source_type type, const char *name,
 	}
 
 	grep_source_init(&todo[todo_end].source, type, name, id);
+	if (opt->binary != GREP_BINARY_TEXT)
+		grep_source_load_driver(&todo[todo_end].source);
 	todo[todo_end].done = 0;
 	strbuf_reset(&todo[todo_end].out);
 	todo_end = (todo_end + 1) % ARRAY_SIZE(todo);
@@ -333,7 +335,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
 		strbuf_release(&pathbuf);
 		return 0;
 	} else
@@ -362,7 +364,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 
 #ifndef NO_PTHREADS
 	if (use_threads) {
-		add_work(GREP_SOURCE_FILE, buf.buf, filename);
+		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename);
 		strbuf_release(&buf);
 		return 0;
 	} else
-- 
1.7.9.3.gc3fce1.dirty
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]