Re: git bug report: 'git add' hangs in a large repo which has sparse-checkout file with large number of patterns in it

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 29, 2022 at 12:50 PM Dian Xu <dianxudev@xxxxxxxxx> wrote:
>
> Dear Git developers,
>
> Reporting Issue:
>               'git add' hangs in a large repo which has
> sparse-checkout file with large number of patterns in it
>
> Found in:
>               Git 2.34.3. Issue occurs after 'audit for interaction
> with sparse-index' was introduced in add.c
>
> Reproduction steps:
>               1. Clone a repo which has e.g. 2 million plus files
>               2. Enable sparse checkout by: git config core.sparsecheckout true
>               3. Create a .git/info/sparse-checkout file with a large
> number of patterns, e.g. 16k plus lines

Did you run `git read-tree -mu HEAD` or even `git sparse-checkout
reapply` after step 3 and before step 4?  If not, you've left the
working tree out-of-sync with the specified sparsity paths and should
fix that before running step 4.

>               4. Run 'git add', which will hang

Alternatively to the above, if you really want to add a file and
ignore the fact that it might be outside the sparsity patterns (and
risk it later randomly disappearing with checkout/rebase/merge/etc.
commands), then you can use `git add --sparse $FILENAME`.

> Investigations:
>               1. Stack trace:
>                        add.c: cmd_add
>                   -> add.c: prune_directory
>                   -> pathspec.c: add_pathspec_matches_against_index
>                   -> dir.c: path_in_sparse_checkout_1
>               2. In Git 2.33.3, the loop at pathspec.c line 42 runs
> fast, even when istate->cache_nr is at 2 million
>               3. Since Git 2.34.3, the newly introduced 'audit for
> interaction with sparse-index' (dir.c line 1459:
> path_in_sparse_checkout_1) decides to loop through 2 million files and
> match each one of them against the sparse-checkout patterns
>               4. This hits the O(n^2) problem thus causes 'git add' to
> hang (or ~1.5 hours to finish)
>
> Please help us take a look at this issue and let us know if you need
> more information.

I'm also curious if you can use --cone mode in sparse-checkout.  The
O(N*M) behavior of sparse checkouts in non-cone mode is pretty
fundamental, and we may need to add additional paths checking the
sparsity patterns (i.e. more O(N*M) codepaths) to fix various
user-observed bugs.  Usage of --cone mode drops all of these to a
linear cost.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux