[PATCH v4 0/5] Parallel Checkout (part 2)

Matheus Tavares <matheus.bernardino@xxxxxx> · Mon, 19 Apr 2021 16:53:30 -0300

This version is almost identical to v3, but the last patch incorporates
the typo fixes and other rewording suggestions Christian made about the
design doc on the last round.

I decided to remove the sentence about step 3 dominating the execution
time as that's not always the case on e.g. a non-local clone or
sparse-checkout.

Matheus Tavares (5):
  unpack-trees: add basic support for parallel checkout
  parallel-checkout: make it truly parallel
  parallel-checkout: add configuration options
  parallel-checkout: support progress displaying
  parallel-checkout: add design documentation

 .gitignore                                    |   1 +
 Documentation/Makefile                        |   1 +
 Documentation/config/checkout.txt             |  21 +
 Documentation/technical/parallel-checkout.txt | 270 ++++++++
 Makefile                                      |   2 +
 builtin.h                                     |   1 +
 builtin/checkout--worker.c                    | 145 ++++
 entry.c                                       |  17 +-
 git.c                                         |   2 +
 parallel-checkout.c                           | 655 ++++++++++++++++++
 parallel-checkout.h                           | 111 +++
 unpack-trees.c                                |  19 +-
 12 files changed, 1240 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/technical/parallel-checkout.txt
 create mode 100644 builtin/checkout--worker.c
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

Range-diff against v3:
1:  7096822c14 = 1:  7096822c14 unpack-trees: add basic support for parallel checkout
2:  4526516ea0 = 2:  4526516ea0 parallel-checkout: make it truly parallel
3:  ad165c0637 = 3:  ad165c0637 parallel-checkout: add configuration options
4:  cf9e28dc0e = 4:  cf9e28dc0e parallel-checkout: support progress displaying
5:  415d4114aa ! 5:  fd929f072c parallel-checkout: add design documentation
    @@ Documentation/technical/parallel-checkout.txt (new)
     +* Step 4: Write the new index to disk.
     +
     +Step 3 is the focus of the "parallel checkout" effort described here.
    -+It dominates the execution time for most of the above command types.
     +
     +Sequential Implementation
     +-------------------------
    @@ Documentation/technical/parallel-checkout.txt (new)
     +It wouldn't be safe to perform Step 3b in parallel, as there could be
     +race conditions between file creations and removals. Instead, the
     +parallel checkout framework lets the sequential code handle Step 3b,
    -+and use parallel workers to replace the sequential
    ++and uses parallel workers to replace the sequential
     +`entry.c:write_entry()` calls from Step 3c.
     +
     +Rejected Multi-Threaded Solution
    @@ Documentation/technical/parallel-checkout.txt (new)
     +warning for the user, like the classic sequential checkout does.
     +
     +The workers are able to detect both collisions among the entries being
    -+concurrently written and collisions among parallel-eligible and
    -+ineligible entries. The general idea for collision detection is quite
    -+straightforward: for each parallel-eligible entry, the main process must
    -+remove all files that prevent this entry from being written (before
    -+enqueueing it). This includes any non-directory file in the leading path
    -+of the entry. Later, when a worker gets assigned the entry, it looks
    -+again for the non-directories files and for an already existing file at
    -+the entry's path. If any of these checks finds something, the worker
    -+knows that there was a path collision.
    ++concurrently written and collisions between a parallel-eligible entry
    ++and an ineligible entry. The general idea for collision detection is
    ++quite straightforward: for each parallel-eligible entry, the main
    ++process must remove all files that prevent this entry from being written
    ++(before enqueueing it). This includes any non-directory file in the
    ++leading path of the entry. Later, when a worker gets assigned the entry,
    ++it looks again for the non-directories files and for an already existing
    ++file at the entry's path. If any of these checks finds something, the
    ++worker knows that there was a path collision.
     +
     +Because parallel checkout can distinguish path collisions from the case
     +where the file was already present in the working tree before checkout,
    @@ Documentation/technical/parallel-checkout.txt (new)
     +Besides, long-running filters may use the delayed checkout feature to
     +postpone the return of some filtered blobs. The delayed checkout queue
     +and the parallel checkout queue are not compatible and should remain
    -+separated.
    ++separate.
     ++
     +Note: regular files that only require internal filters, like end-of-line
     +conversion and re-encoding, are eligible for parallel checkout.
    @@ Documentation/technical/parallel-checkout.txt (new)
     +The API
     +-------
     +
    -+The parallel checkout API was designed with the goal to minimize changes
    -+to the current users of the checkout machinery. This means that they
    -+don't have to call a different function for sequential or parallel
    ++The parallel checkout API was designed with the goal of minimizing
    ++changes to the current users of the checkout machinery. This means that
    ++they don't have to call a different function for sequential or parallel
     +checkout. As already mentioned, `checkout_entry()` will automatically
     +insert the given entry in the parallel checkout queue when this feature
     +is enabled and the entry is eligible; otherwise, it will just write the
-- 
2.30.1