On Thu, Jul 7, 2022 at 9:53 PM Elijah Newren <newren@xxxxxxxxx> wrote: > > On Tue, Jul 5, 2022 at 6:08 AM Dian Xu <dianxudev@xxxxxxxxx> wrote: > > > > Hi Elijah, > > Hi Dian, > > Please don't top post on this list. It'd also help to respond to the > relevant email instead of picking a different email in the thread to > put your answers in. Anyway, that aside... > > > Please see answers below: > > > > 1. H: 2.27m; S: 7.7k; Total: 2.28m > > > > 2. Sure I will run 'reapply' after the sparse-checkout file has > > changed. Just curious, do I have to run 'reapply' if 'checkout' is the > > next immediate cmd? I thought 'checkout' does the updating index as > > well > > > > 3. I simply added one file only, 'git add' and 'git add --sparse' > > still hang. Let me know if you need me to send you any debug info from > > pathspec.c/dir.c > > > > 4. Good to know and we are investigating if we have a way out from --no-cone > > > > 5. I should've been clearer: The experiment done here uses 2.37.0 > > Thanks for providing these details. It was enough to at least get me > started, and from my experiments, it appears the arguments to `git > add` are important. In particular, I could not trigger this when > passing actual filenames that existed. I could when passing a fake > filename. Here's the concrete steps I used to reproduce: > > git clone git@xxxxxxxxxx:newren/gvfs-like-git-bomb > cd gvfs-like-git-bomb > > git init attempt > cd attempt > ../make-a-git-bomb.sh > > time git checkout bomb > > echo "/*" >.git/info/sparse-checkout > echo '!/bomb/j/j/' >>.git/info/sparse-checkout > for i in $(seq 1 10000); do > printf '!some/random/file/path-%05d\n' $i > done >>.git/info/sparse-checkout > git config core.sparseCheckout true > time git sparse-checkout reapply > > echo hello >world > time git add --sparse world nonexistent > time git rm --cached --sparse world nonexistent > time git add world nonexistent > time git rm --cached world nonexistent > > This sequence of steps will (1) clone a repo with 2 files, (2) create > another repository in subdirectory 'attempt' that has 1000001 files > (but only two unique files, and only six or so unique trees) in a > branch called 'bomb', (3) check it out, (4) create 10002 patterns for > the sparse-checkout file (only the first 2 of which match anything) > which will leave ~99% of files still present (990001 files checked out > and 10000 files sparse) and turn on sparsity, (5) measure how long it > takes to add and remove a file from the index, both with and without > the --sparse flag, but always listing an extra path that won't match > anything. > > The timings I see for the setup steps are: > 4m10.444s checkout bomb > 1m0.380s sparse-checkout reapply > > And the timings for the add/rm steps are: > 4m43.353s add --sparse world nonexistent > 9m25.666s add world nonexistent > 0m0.129s rm --cached --sparse world nonexistent > 9m23.601s rm --cached world nonexistent > > which shows that 'rm' also has a performance problem without the > '--sparse' flag (which seems like another bug). > > Now, if I remove the 'nonexistent' argument from the commands, then > the timings drop to: > 0m0.236s add --sparse world > 0m0.233s add world > 0m0.175s rm --cached --sparse world > 4m43.744s rm --cached world > > So, I can reproduce some slowness. 'rm' without --sparse seems > buggily slow for either set, whereas 'add' is only slow when given a > fake path. You never mentioned anything about the arguments you were > passing to `git add`, so I don't know whether you are using specific > filenames that just don't exist (like I did above), or globs that > perhaps match some files, or something else. That might be useful to > know. But there appears to be something here for both 'add' and 'rm' > that we could look into optimizing. I don't have time right now. I'm > not sure if someone else has some time to look into it; if no one else > does, I'll eventually try to get back to it. Hi Elijah, Thank you for sharing the reproduction steps. I believe they represent our workflow. We use 'git add <path_to_file>', where path_to_file is an existing file, which is also within sparse-checkout shape. Not sure this is related but we also use --reference while setting up the clone. Dian Xu Mathworks, Inc 1 Lakeside Campus Drive, Natick, MA 01760 508-647-3583