Re: git bug report: 'git add' hangs in a large repo which has sparse-checkout file with large number of patterns in it

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 7, 2022 at 9:53 PM Elijah Newren <newren@xxxxxxxxx> wrote:
>
> On Tue, Jul 5, 2022 at 6:08 AM Dian Xu <dianxudev@xxxxxxxxx> wrote:
> >
> > Hi Elijah,
>
> Hi Dian,
>
> Please don't top post on this list.  It'd also help to respond to the
> relevant email instead of picking a different email in the thread to
> put your answers in.  Anyway, that aside...
>
> > Please see answers below:
> >
> > 1.  H: 2.27m; S: 7.7k; Total: 2.28m
> >
> > 2.  Sure I will run 'reapply' after the sparse-checkout file has
> > changed. Just curious, do I have to run 'reapply' if 'checkout' is the
> > next immediate cmd? I thought 'checkout' does the updating index as
> > well
> >
> > 3.  I simply added one file only, 'git add' and 'git add --sparse'
> > still hang. Let me know if you need me to send you any debug info from
> > pathspec.c/dir.c
> >
> > 4.  Good to know and we are investigating if we have a way out from --no-cone
> >
> > 5.  I should've been clearer: The experiment done here uses 2.37.0
>
> Thanks for providing these details.  It was enough to at least get me
> started, and from my experiments, it appears the arguments to `git
> add` are important.  In particular, I could not trigger this when
> passing actual filenames that existed.  I could when passing a fake
> filename.  Here's the concrete steps I used to reproduce:
>
>     git clone git@xxxxxxxxxx:newren/gvfs-like-git-bomb
>     cd gvfs-like-git-bomb
>
>     git init attempt
>     cd attempt
>     ../make-a-git-bomb.sh
>
>     time git checkout bomb
>
>     echo "/*" >.git/info/sparse-checkout
>     echo '!/bomb/j/j/' >>.git/info/sparse-checkout
>     for i in $(seq 1 10000); do
>         printf '!some/random/file/path-%05d\n' $i
>     done >>.git/info/sparse-checkout
>     git config core.sparseCheckout true
>     time git sparse-checkout reapply
>
>     echo hello >world
>     time git add --sparse world nonexistent
>     time git rm --cached --sparse world nonexistent
>     time git add world nonexistent
>     time git rm --cached world nonexistent
>
> This sequence of steps will (1) clone a repo with 2 files, (2) create
> another repository in subdirectory 'attempt' that has 1000001 files
> (but only two unique files, and only six or so unique trees) in a
> branch called 'bomb', (3) check it out, (4) create 10002 patterns for
> the sparse-checkout file (only the first 2 of which match anything)
> which will leave ~99% of files still present (990001 files checked out
> and 10000 files sparse) and turn on sparsity, (5) measure how long it
> takes to add and remove a file from the index, both with and without
> the --sparse flag, but always listing an extra path that won't match
> anything.
>
> The timings I see for the setup steps are:
>     4m10.444s  checkout bomb
>     1m0.380s   sparse-checkout reapply
>
> And the timings for the add/rm steps are:
>     4m43.353s  add --sparse world nonexistent
>     9m25.666s  add world nonexistent
>     0m0.129s  rm --cached --sparse world nonexistent
>     9m23.601s  rm --cached world nonexistent
>
> which shows that 'rm' also has a performance problem without the
> '--sparse' flag (which seems like another bug).
>
> Now, if I remove the 'nonexistent' argument from the commands, then
> the timings drop to:
>     0m0.236s   add --sparse world
>     0m0.233s   add world
>     0m0.175s   rm --cached --sparse world
>     4m43.744s  rm --cached world
>
> So, I can reproduce some slowness.  'rm' without --sparse seems
> buggily slow for either set, whereas 'add' is only slow when given a
> fake path.  You never mentioned anything about the arguments you were
> passing to `git add`, so I don't know whether you are using specific
> filenames that just don't exist (like I did above), or globs that
> perhaps match some files, or something else.  That might be useful to
> know.  But there appears to be something here for both 'add' and 'rm'
> that we could look into optimizing.  I don't have time right now.  I'm
> not sure if someone else has some time to look into it; if no one else
> does, I'll eventually try to get back to it.

Hi Elijah,

Thank you for sharing the reproduction steps. I believe they represent
our workflow.

We use 'git add <path_to_file>', where path_to_file is an existing
file, which is also within sparse-checkout shape.

Not sure this is related but we also use --reference while setting up the clone.

Dian Xu
Mathworks, Inc
1 Lakeside Campus Drive, Natick, MA 01760
508-647-3583



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux