Re: git grep --threads 12 --textconv is effectively single-threaded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So I just need to

git -c "diff.c.cachetextconv=true" ...

And the cache should automagically work?

Zach Riggle

On Tue, Jul 7, 2020 at 4:59 PM Jeff King <peff@xxxxxxxx> wrote:
>
> On Tue, Jul 07, 2020 at 04:25:01PM -0500, Zach Riggle wrote:
>
> > It looks like the bit of code that is responsible for performing
> > textconv conversions is single-threaded, even if git-grep is provided
> > a number of threads to use.
>
> Yes, the locking is much coarser than it could be. The issue is in
> grep.c's fill_textconv_grep():
>
>           /*
>            * fill_textconv is not remotely thread-safe; it modifies the global
>            * diff tempfile structure, writes to the_repo's odb and might
>            * internally call thread-unsafe functions such as the
>            * prepare_packed_git() lazy-initializator. Because of the last two, we
>            * must ensure mutual exclusion between this call and the object reading
>            * API, thus we use obj_read_lock() here.
>            *
>            * TODO: allowing text conversion to run in parallel with object
>            * reading operations might increase performance in the multithreaded
>            * non-worktreee git-grep with --textconv.
>            */
>           obj_read_lock();
>           size = fill_textconv(r, driver, df, &buf);
>           obj_read_unlock();
>           free_filespec(df);
>
> Note that this lock is used whether we're doing textconv's or not (i.e.,
> it also excludes reading two objects from the object database at the
> same time, because none of that code is thread-safe). But the latency
> when we're doing a textconv is _much_ higher, because it's shelling out
> to a separate process and reading/writing the contents. Note the
> much-higher system CPU in your second timing:
>
> > Note the difference in total CPU usage in the following expressions:
> >
> > $ git grep --threads 12 -e foobar --and -e fizzbuzz &> /dev/null
> > 0.24s user 0.28s system 710% cpu 0.073 total
> >
> > $ git grep --threads 12 -e foobar --and -e fizzbuzz --textconv &> /dev/null
> > 0.90s user 1.75s system 110% cpu 2.390 total
>
> So I think implementing that TODO would help a lot (because each
> textconv could in theory proceed in parallel).
>
> As workaround in the meantime, I suspect that enabling
> diff.<driver>.cachetextconv for your particular textconv config might
> help. It would be slow on the first run, but then we'd be able to skip
> the external process entirely for subsequent runs (the results are
> cached in a git-notes tree, which are just raw object reads).
>
> -Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux