So I just need to git -c "diff.c.cachetextconv=true" ... And the cache should automagically work? Zach Riggle On Tue, Jul 7, 2020 at 4:59 PM Jeff King <peff@xxxxxxxx> wrote: > > On Tue, Jul 07, 2020 at 04:25:01PM -0500, Zach Riggle wrote: > > > It looks like the bit of code that is responsible for performing > > textconv conversions is single-threaded, even if git-grep is provided > > a number of threads to use. > > Yes, the locking is much coarser than it could be. The issue is in > grep.c's fill_textconv_grep(): > > /* > * fill_textconv is not remotely thread-safe; it modifies the global > * diff tempfile structure, writes to the_repo's odb and might > * internally call thread-unsafe functions such as the > * prepare_packed_git() lazy-initializator. Because of the last two, we > * must ensure mutual exclusion between this call and the object reading > * API, thus we use obj_read_lock() here. > * > * TODO: allowing text conversion to run in parallel with object > * reading operations might increase performance in the multithreaded > * non-worktreee git-grep with --textconv. > */ > obj_read_lock(); > size = fill_textconv(r, driver, df, &buf); > obj_read_unlock(); > free_filespec(df); > > Note that this lock is used whether we're doing textconv's or not (i.e., > it also excludes reading two objects from the object database at the > same time, because none of that code is thread-safe). But the latency > when we're doing a textconv is _much_ higher, because it's shelling out > to a separate process and reading/writing the contents. Note the > much-higher system CPU in your second timing: > > > Note the difference in total CPU usage in the following expressions: > > > > $ git grep --threads 12 -e foobar --and -e fizzbuzz &> /dev/null > > 0.24s user 0.28s system 710% cpu 0.073 total > > > > $ git grep --threads 12 -e foobar --and -e fizzbuzz --textconv &> /dev/null > > 0.90s user 1.75s system 110% cpu 2.390 total > > So I think implementing that TODO would help a lot (because each > textconv could in theory proceed in parallel). > > As workaround in the meantime, I suspect that enabling > diff.<driver>.cachetextconv for your particular textconv config might > help. It would be slow on the first run, but then we'd be able to skip > the external process entirely for subsequent runs (the results are > cached in a git-notes tree, which are just raw object reads). > > -Peff