Re: git grep --threads 12 --textconv is effectively single-threaded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 07, 2020 at 04:25:01PM -0500, Zach Riggle wrote:

> It looks like the bit of code that is responsible for performing
> textconv conversions is single-threaded, even if git-grep is provided
> a number of threads to use.

Yes, the locking is much coarser than it could be. The issue is in
grep.c's fill_textconv_grep():

          /*
           * fill_textconv is not remotely thread-safe; it modifies the global
           * diff tempfile structure, writes to the_repo's odb and might
           * internally call thread-unsafe functions such as the
           * prepare_packed_git() lazy-initializator. Because of the last two, we
           * must ensure mutual exclusion between this call and the object reading
           * API, thus we use obj_read_lock() here.
           *
           * TODO: allowing text conversion to run in parallel with object
           * reading operations might increase performance in the multithreaded
           * non-worktreee git-grep with --textconv.
           */
          obj_read_lock();
          size = fill_textconv(r, driver, df, &buf);
          obj_read_unlock();
          free_filespec(df);

Note that this lock is used whether we're doing textconv's or not (i.e.,
it also excludes reading two objects from the object database at the
same time, because none of that code is thread-safe). But the latency
when we're doing a textconv is _much_ higher, because it's shelling out
to a separate process and reading/writing the contents. Note the
much-higher system CPU in your second timing:

> Note the difference in total CPU usage in the following expressions:
> 
> $ git grep --threads 12 -e foobar --and -e fizzbuzz &> /dev/null
> 0.24s user 0.28s system 710% cpu 0.073 total
> 
> $ git grep --threads 12 -e foobar --and -e fizzbuzz --textconv &> /dev/null
> 0.90s user 1.75s system 110% cpu 2.390 total

So I think implementing that TODO would help a lot (because each
textconv could in theory proceed in parallel).

As workaround in the meantime, I suspect that enabling
diff.<driver>.cachetextconv for your particular textconv config might
help. It would be slow on the first run, but then we'd be able to skip
the external process entirely for subsequent runs (the results are
cached in a git-notes tree, which are just raw object reads).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux