Re: [PATCH v7 00/16] Sparse-index: integrate with status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 28, 2021 at 7:04 PM Derrick Stolee via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' very fast when a sparse-index is enabled on a repository with
> cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> The latter branch is needed because it changes the behavior of 'git add'
> around sparse entries, which changes the expectations of a test added in
> patch 1.
>
> The approach here is to audit the places where ensure_full_index() pops up
> while doing normal commands with pathspecs within the sparse-checkout
> definition. Each of these are checked and tested. In the end, the
> sparse-index is integrated with these features:
>
>  * git status
>  * FS Monitor index extension.
>
> The performance tests in p2000-sparse-operations.sh improve by 95% or more,
> even when compared with the full-index cases, not just the sparse-index
> cases that previously had extra overhead.
>
> Hopefully this is the first example of how ds/sparse-index-protections has
> done the basic work to do these conversions safely, making them look easier
> than they seemed when starting this adventure.
>
> Thanks, -Stolee
>
>
> Update in V7 (relative to v5)
> =============================
>
> APOLOGIES: As I was working on this cover letter, I was still organizing my
> big list of patches, including reordering some into this series. I forgot to
> actually include them in my v6 submission, so here is a re-submission.
> Please ignore v6.
>
> I'm sorry that this revision took so long. Initially I was blocked on
> getting the directory/file conflict figured out (I did), but also my team
> was very busy with some things. Eventually, we reached an internal deadline
> to make an experimental release available [1] with initial sparse-index
> performance boosts. Creating that included some additional review by Jeff
> Hostetler and Johannes Schindelin which led to more changes in this version.
>
> The good news is that this series has now created the basis for many Git
> commands to integrate with the sparse-index without much additional work.
> This effort was unfortunately overloaded on this series because the changes
> needed for things like 'git checkout' or 'git add' all intersect with the
> changes needed for 'git status'. Might as well get it right the first time.
>
> Because the range-diff is a big difficult to read this time, I'll break the
> changes down on a patch-by-patch basis.
>
>  1. sparse-index: skip indexes with unmerged entries
>
>     (no change)
>
>  2. sparse-index: include EXTENDED flag when expanding
>
>  * Commit message better describes the purpose of the change.
>
>  3. t1092: replace incorrect 'echo' with 'cat'
>
>  * Typo fix
>
>  4. t1092: expand repository data shape
>
>  * some files are added that surround "folder1/" immediately before and
>    after, based on the sorting with the trailing slash. This provides extra
>    coverage.
>
>  5. t1092: add tests for status/add and sparse files
>
>     (no change)
>
>  6. unpack-trees: preserve cache_bottom
>
>     (no change)
>
>  7. unpack-trees: compare sparse directories correctly
>
>  * We were previosly not comparing the path lengths, which causes a problem
>    (with later updates) when a sparse directory such as "folder1/0/" gets
>    compared to a tree name "folder1".
>
>  8. unpack-trees: rename unpack_nondirectories()
>
>  * This new commit changes the name to make more sense with its new behavior
>    that could modify a sparse directory entry. The point of the method is in
>    contrast to recursing into trees.
>
>  9. unpack-trees: unpack sparse directory entries
>
>  * THIS is the biggest change from previous versions. There were a few
>    things going on that were tricky to get right, especially with the
>    directory/file conflict (handled in an update in the following, new
>    patch).
>
>  * The changes to create_ce_entry() regarding alloc_len missed a spot that
>    was critical to getting the length right in the allocated entry.
>
>  * Use '\0' over 0 to represent the terminating character.
>
>  * We don't need a "sparse_directory" parameter to unpack_nondirectories()
>    (which was renamed to unpack_single_entry() by the previous new patch)
>    because we can use dirmask to discover if src[0] (or any other value)
>    should be a sparse directory entry.
>
>  * Similarly, we don't need to call the method twice from unpack_callback().
>
>  * The 'conflicts' variable is set to match the dirmask in the beginning,
>    but it should depend on whether or not we have a sparse directory entry
>    instead, and if all trees that have the path have a directory.
>
>  * The implementation of find_cache_entry() uses find_cache_pos() to find an
>    insertion position for a path if it doesn't find an exact match. Before,
>    we subtracted one to find the sparse directory entry, but there could be
>    multiple paths between the sparse directory entry and the insertion
>    point, so we need to walk backwards until we find it. This requires many
>    paths having the same prefix, so hopefully is a rare case. Some of the
>    test data changes were added to cover the need for this logic. This uses
>    a helper method, sparse_dir_matches_path, which is also used by
>    is_sparse_directory_entry.
>
>  10. unpack-trees: handle dir/file conflict of sparse entries
>
>  * This new logic inside twoway_merge handles the special case for dealing
>    with a directory/file conflict during a 'git checkout'. The necessarily
>    data and tests are also added here, though the logic will only take
>    serious effect when we integrate with 'git checkout' later.
>
>  11. dir.c: accept a directory as part of cone-mode patterns
>
>  * The value slash_pos was previously a pointer within a strbuf, but in some
>    cases we add to that strbuf and that could reallocate the pointer, making
>    slash_pos be invalid. The replacement is to have slash_pos be an integer
>    position within the string, so it is consistent even if the string is
>    reallocated for an append.
>
>  12. diff-lib: handle index diffs with sparse dirs
>
>  * As recommended in the previous review, a simple diff_tree_oid() replaces
>    the complicated use of read_tree_at() and traverse_trees() in the
>    previous version.
>
>  13. status: skip sparse-checkout percentage with sparse-index
>
>      (no change)
>
>  14. status: use sparse-index throughout
>
>      (no change)
>
>  15. wt-status: expand added sparse directory entries
>
>  * Duplicate 'git status --porcelain=v2' lines are removed from tests.
>
>  * The pathspec is initialized using "= { 0 }" instead of memset().
>
>  16. fsmonitor: integrate with sparse index
>
>  * An extra test_region is added to ensure that the filesystem monitor hook
>    is still being called, and we are not simply disabling the feature
>    entirely.

This is SUPER exciting.  I've only read the cover letter, but it
strongly suggests you've not only handled all my feedback in previous
rounds, but got things pretty solidly nailed away.  I'll try to make
some time to go over it all soon.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux