Re: git bug report: 'git add' hangs in a large repo which has sparse-checkout file with large number of patterns in it

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 5, 2022 at 6:08 AM Dian Xu <dianxudev@xxxxxxxxx> wrote:
>
> Hi Elijah,

Hi Dian,

Please don't top post on this list.  It'd also help to respond to the
relevant email instead of picking a different email in the thread to
put your answers in.  Anyway, that aside...

> Please see answers below:
>
> 1.  H: 2.27m; S: 7.7k; Total: 2.28m
>
> 2.  Sure I will run 'reapply' after the sparse-checkout file has
> changed. Just curious, do I have to run 'reapply' if 'checkout' is the
> next immediate cmd? I thought 'checkout' does the updating index as
> well
>
> 3.  I simply added one file only, 'git add' and 'git add --sparse'
> still hang. Let me know if you need me to send you any debug info from
> pathspec.c/dir.c
>
> 4.  Good to know and we are investigating if we have a way out from --no-cone
>
> 5.  I should've been clearer: The experiment done here uses 2.37.0

Thanks for providing these details.  It was enough to at least get me
started, and from my experiments, it appears the arguments to `git
add` are important.  In particular, I could not trigger this when
passing actual filenames that existed.  I could when passing a fake
filename.  Here's the concrete steps I used to reproduce:

    git clone git@xxxxxxxxxx:newren/gvfs-like-git-bomb
    cd gvfs-like-git-bomb

    git init attempt
    cd attempt
    ../make-a-git-bomb.sh

    time git checkout bomb

    echo "/*" >.git/info/sparse-checkout
    echo '!/bomb/j/j/' >>.git/info/sparse-checkout
    for i in $(seq 1 10000); do
        printf '!some/random/file/path-%05d\n' $i
    done >>.git/info/sparse-checkout
    git config core.sparseCheckout true
    time git sparse-checkout reapply

    echo hello >world
    time git add --sparse world nonexistent
    time git rm --cached --sparse world nonexistent
    time git add world nonexistent
    time git rm --cached world nonexistent

This sequence of steps will (1) clone a repo with 2 files, (2) create
another repository in subdirectory 'attempt' that has 1000001 files
(but only two unique files, and only six or so unique trees) in a
branch called 'bomb', (3) check it out, (4) create 10002 patterns for
the sparse-checkout file (only the first 2 of which match anything)
which will leave ~99% of files still present (990001 files checked out
and 10000 files sparse) and turn on sparsity, (5) measure how long it
takes to add and remove a file from the index, both with and without
the --sparse flag, but always listing an extra path that won't match
anything.

The timings I see for the setup steps are:
    4m10.444s  checkout bomb
    1m0.380s   sparse-checkout reapply

And the timings for the add/rm steps are:
    4m43.353s  add --sparse world nonexistent
    9m25.666s  add world nonexistent
    0m0.129s  rm --cached --sparse world nonexistent
    9m23.601s  rm --cached world nonexistent

which shows that 'rm' also has a performance problem without the
'--sparse' flag (which seems like another bug).

Now, if I remove the 'nonexistent' argument from the commands, then
the timings drop to:
    0m0.236s   add --sparse world
    0m0.233s   add world
    0m0.175s   rm --cached --sparse world
    4m43.744s  rm --cached world

So, I can reproduce some slowness.  'rm' without --sparse seems
buggily slow for either set, whereas 'add' is only slow when given a
fake path.  You never mentioned anything about the arguments you were
passing to `git add`, so I don't know whether you are using specific
filenames that just don't exist (like I did above), or globs that
perhaps match some files, or something else.  That might be useful to
know.  But there appears to be something here for both 'add' and 'rm'
that we could look into optimizing.  I don't have time right now.  I'm
not sure if someone else has some time to look into it; if no one else
does, I'll eventually try to get back to it.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux