This version is almost identical to v3, but the last patch incorporates the typo fixes and other rewording suggestions Christian made about the design doc on the last round. I decided to remove the sentence about step 3 dominating the execution time as that's not always the case on e.g. a non-local clone or sparse-checkout. Matheus Tavares (5): unpack-trees: add basic support for parallel checkout parallel-checkout: make it truly parallel parallel-checkout: add configuration options parallel-checkout: support progress displaying parallel-checkout: add design documentation .gitignore | 1 + Documentation/Makefile | 1 + Documentation/config/checkout.txt | 21 + Documentation/technical/parallel-checkout.txt | 270 ++++++++ Makefile | 2 + builtin.h | 1 + builtin/checkout--worker.c | 145 ++++ entry.c | 17 +- git.c | 2 + parallel-checkout.c | 655 ++++++++++++++++++ parallel-checkout.h | 111 +++ unpack-trees.c | 19 +- 12 files changed, 1240 insertions(+), 5 deletions(-) create mode 100644 Documentation/technical/parallel-checkout.txt create mode 100644 builtin/checkout--worker.c create mode 100644 parallel-checkout.c create mode 100644 parallel-checkout.h Range-diff against v3: 1: 7096822c14 = 1: 7096822c14 unpack-trees: add basic support for parallel checkout 2: 4526516ea0 = 2: 4526516ea0 parallel-checkout: make it truly parallel 3: ad165c0637 = 3: ad165c0637 parallel-checkout: add configuration options 4: cf9e28dc0e = 4: cf9e28dc0e parallel-checkout: support progress displaying 5: 415d4114aa ! 5: fd929f072c parallel-checkout: add design documentation @@ Documentation/technical/parallel-checkout.txt (new) +* Step 4: Write the new index to disk. + +Step 3 is the focus of the "parallel checkout" effort described here. -+It dominates the execution time for most of the above command types. + +Sequential Implementation +------------------------- @@ Documentation/technical/parallel-checkout.txt (new) +It wouldn't be safe to perform Step 3b in parallel, as there could be +race conditions between file creations and removals. Instead, the +parallel checkout framework lets the sequential code handle Step 3b, -+and use parallel workers to replace the sequential ++and uses parallel workers to replace the sequential +`entry.c:write_entry()` calls from Step 3c. + +Rejected Multi-Threaded Solution @@ Documentation/technical/parallel-checkout.txt (new) +warning for the user, like the classic sequential checkout does. + +The workers are able to detect both collisions among the entries being -+concurrently written and collisions among parallel-eligible and -+ineligible entries. The general idea for collision detection is quite -+straightforward: for each parallel-eligible entry, the main process must -+remove all files that prevent this entry from being written (before -+enqueueing it). This includes any non-directory file in the leading path -+of the entry. Later, when a worker gets assigned the entry, it looks -+again for the non-directories files and for an already existing file at -+the entry's path. If any of these checks finds something, the worker -+knows that there was a path collision. ++concurrently written and collisions between a parallel-eligible entry ++and an ineligible entry. The general idea for collision detection is ++quite straightforward: for each parallel-eligible entry, the main ++process must remove all files that prevent this entry from being written ++(before enqueueing it). This includes any non-directory file in the ++leading path of the entry. Later, when a worker gets assigned the entry, ++it looks again for the non-directories files and for an already existing ++file at the entry's path. If any of these checks finds something, the ++worker knows that there was a path collision. + +Because parallel checkout can distinguish path collisions from the case +where the file was already present in the working tree before checkout, @@ Documentation/technical/parallel-checkout.txt (new) +Besides, long-running filters may use the delayed checkout feature to +postpone the return of some filtered blobs. The delayed checkout queue +and the parallel checkout queue are not compatible and should remain -+separated. ++separate. ++ +Note: regular files that only require internal filters, like end-of-line +conversion and re-encoding, are eligible for parallel checkout. @@ Documentation/technical/parallel-checkout.txt (new) +The API +------- + -+The parallel checkout API was designed with the goal to minimize changes -+to the current users of the checkout machinery. This means that they -+don't have to call a different function for sequential or parallel ++The parallel checkout API was designed with the goal of minimizing ++changes to the current users of the checkout machinery. This means that ++they don't have to call a different function for sequential or parallel +checkout. As already mentioned, `checkout_entry()` will automatically +insert the given entry in the parallel checkout queue when this feature +is enabled and the entry is eligible; otherwise, it will just write the -- 2.30.1