When working directories get big, checkout times start to suffer. Even with GVFS virtualization (which limits git to only having to update those files that have been changed locally) we�re seeing P50 times for checkout of 31 seconds and the P80 time is 43 seconds. Here is a checkout command with tracing turned on to demonstrate where the time is spent. Note, this is somewhat of a �best case� as I�m simply checking out the current commit: benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev) $ /usr/src/git/git.exe checkout 12:31:50.419016 read-cache.c:2006 performance: 1.180966800 s: read cache .git/index 12:31:51.184636 name-hash.c:605 performance: 0.664575200 s: initialize name hash 12:31:51.200280 preload-index.c:111 performance: 0.019811600 s: preload index 12:31:51.294012 read-cache.c:1543 performance: 0.094515600 s: refresh index 12:32:29.731344 unpack-trees.c:1358 performance: 33.889840200 s: traverse_trees 12:32:37.512555 read-cache.c:2541 performance: 1.564438300 s: write index, changed mask = 28 12:32:44.918730 unpack-trees.c:1358 performance: 7.243155600 s: traverse_trees 12:32:44.965611 diff-lib.c:527 performance: 7.374729200 s: diff-index Waiting for GVFS to parse index and update placeholder files...Succeeded 12:32:46.824986 trace.c:420 performance: 57.715656000 s: git command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout Clearly, most of the time (41 seconds) is spent in the traverse_trees() code so the question is, how can we significantly speed up that portion of the command? I investigated a few options with limited success: ODB cache ========= Since traverse_trees() hits the ODB for each tree object (of which there are over 500K in this repo) I wrote and tested having an in-memory ODB cache that cached all tree objects. This resulted in a > 50% hit ratio (largely due to the fact we traverse the tree twice during checkout) but resulted in only a minimal savings (1.3 seconds). Tree Graph File =============== I also considered storing the commit tree in an alternate structure that is faster to load/parse (ala the Commit graph) but the cache results along with the negligible impact of running checkout back to back (thus ensuring the objects were cached in my file system cache) made me believe this would not result in much savings. MIDX has already helped out here given we end up with a lot of pack files of commits and trees. Sparse tree traversal ===================== We�ve sped up other parts of git by taking advantage of the existing sparse-checkout/excludes logic to limit what files git has to consider to those that have been modified by the user locally. I haven�t been able to think of a way to take advantage of that with unpack-trees() as when you are merging n commits, a change/conflict can occur in any tree object so they must all be traversed. If I�m missing something here and there _is_ a way to entirely skip large parts of the tree, please let me know! Please note that we�re already limiting the files that git needs to update in the working directory via sparse-checkout/excludes but the other/merge logic still executes for the entire tree whether there are files to update or not. Multi-threading unpack_trees() ============================== The current model of unpack_trees() is that a single thread recursively traverses each tree object as it comes across it. One thought I had was to multi-thread the traversal so that each tree object could be processed in parallel. To test this idea out, I wrote an unbounded Multi-Product-Multi-Consumer queue and then wrote a traverse_trees_parallel() function that would add any new tree objects into the queue where they can be processed by a pool of worker threads. Each thread will wake up when there is work in the queue, remove a tree object, process it adding any additional tree objects it finds. Multi-threading anything in git is fraught with challenges as much of the code base is not thread safe. To make progress, I wrapped mutexes around code paths that were not thread safe. The end result is that I won�t initially get much parallelization (due to mutexes around all the expensive work) but at least I can test out the idea and resolve any other issues with switching from a serial to a parallel implementation. If this works out, I can update more of the code paths to be thread safe and/or move to more fine grained mutexes around those paths that are difficult to make thread safe. Final thoughts ============== The attached set of patches don�t work! For some commands they succeed but I�m including them only to make it explicit what I�m currently investigating. I�d be very interested in design feedback but formatting/spelling/white space errors are less useful at this early stage in the investigation. When I brought up this idea with some other git contributors they mentioned that multi threading unpack_trees() had been discussed a few years ago on the list but that the idea was discarded. They couldn�t remember exactly why it was discarded and none of us have been able to find the email threads from that earlier discussion. As a result, I decided to write up this RFC and see if the greater git community has ideas, suggestions, or more background/history on whether this is a reasonable path to pursue or if there are other/better ideas on how to speed up checkout especially on large repos. Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/a022a91ceb Checkout: git fetch https://github.com/benpeart/git unpacktrees-v1 && git checkout a022a91ceb Ben Peart (3): add unbounded Multi-Producer-Multi-Consumer queue add performance tracing around traverse_trees() in unpack_trees() Add initial parallel version of unpack_trees() Makefile | 1 + cache.h | 1 + config.c | 5 + environment.c | 1 + mpmcqueue.c | 49 ++++++++ mpmcqueue.h | 80 +++++++++++++ unpack-trees.c | 314 ++++++++++++++++++++++++++++++++++++++++++++++++- unpack-trees.h | 30 +++++ 8 files changed, 480 insertions(+), 1 deletion(-) create mode 100644 mpmcqueue.c create mode 100644 mpmcqueue.h base-commit: e3331758f12da22f4103eec7efe1b5304a9be5e9 -- 2.17.0.gvfs.1.123.g449c066