On Tue, Jul 07, 2020 at 04:25:01PM -0500, Zach Riggle wrote: > It looks like the bit of code that is responsible for performing > textconv conversions is single-threaded, even if git-grep is provided > a number of threads to use. Yes, the locking is much coarser than it could be. The issue is in grep.c's fill_textconv_grep(): /* * fill_textconv is not remotely thread-safe; it modifies the global * diff tempfile structure, writes to the_repo's odb and might * internally call thread-unsafe functions such as the * prepare_packed_git() lazy-initializator. Because of the last two, we * must ensure mutual exclusion between this call and the object reading * API, thus we use obj_read_lock() here. * * TODO: allowing text conversion to run in parallel with object * reading operations might increase performance in the multithreaded * non-worktreee git-grep with --textconv. */ obj_read_lock(); size = fill_textconv(r, driver, df, &buf); obj_read_unlock(); free_filespec(df); Note that this lock is used whether we're doing textconv's or not (i.e., it also excludes reading two objects from the object database at the same time, because none of that code is thread-safe). But the latency when we're doing a textconv is _much_ higher, because it's shelling out to a separate process and reading/writing the contents. Note the much-higher system CPU in your second timing: > Note the difference in total CPU usage in the following expressions: > > $ git grep --threads 12 -e foobar --and -e fizzbuzz &> /dev/null > 0.24s user 0.28s system 710% cpu 0.073 total > > $ git grep --threads 12 -e foobar --and -e fizzbuzz --textconv &> /dev/null > 0.90s user 1.75s system 110% cpu 2.390 total So I think implementing that TODO would help a lot (because each textconv could in theory proceed in parallel). As workaround in the meantime, I suspect that enabling diff.<driver>.cachetextconv for your particular textconv config might help. It would be slow on the first run, but then we'd be able to skip the external process entirely for subsequent runs (the results are cached in a git-notes tree, which are just raw object reads). -Peff