Hi Dian, As a heads up, note that on this list we don't top-post. On Fri, Jul 1, 2022 at 1:24 PM Dian Xu <dianxudev@xxxxxxxxx> wrote: > > Hi Victoria, Elijah, Derrick, > > Thanks a lot for the detailed insight. > > (Btw our company’s email mathworks.com is blocked by > mailto:git@xxxxxxxxxxxxxxx, hope someone can help take a look) Konstantin: Is this something you know how to look into? (Or do you know who to ask?) > 1. We use a no-cone version of sparse-checkout to control the 'shape' > (set of scm files) of our source code. In this case, the local sandbox > is not necessarily 'sparse' (2m files), but it's very convenient that > we can use git to check out the exact amount (shape) of files. To > Victoria's question, all these 2m files are "H". How many are "H", how many are "S", and how many files in total? I'd like to try to construct a way to reproduce your issue, and knowing how many of each will help. > 2. Below is the detail steps to create the local repo (sparse-checkout > was defined 'before' git checkout) > % git init > % git remote add origin <url> > % git config core.sparsecheckout true > % vi .git/info/sparse-checkout > % git fetch > % git checkout -b <SHA> > Do I still need to 'git sparse-checkout reapply' after checkout? > (Thanks for pointing out to run reapply once .git/info/sparse-checkout > changed) Why didn't you list 'git sparse-checkout reapply' after editing .git/info/sparse-checkout? You mention it later, so I'm hoping you ran it at that point. You should only need to run the sparse-checkout reapply command after manually editing the .git/info/sparse-checkout file. There are special cases where it might be useful after other commands, but it's pretty rare. Most git commands, and particularly checkout, will keep the sparsity of the working tree up-to-date with the sparse-checkout file -- assuming it was up-to-date beforehand. Basically, feel free to use the rule that you only need to reapply after manual edits of the $GIT_DIR/info/sparse-checkout file. Also, with newer git, you can replace all three of git config core.sparsecheckout true vi .git/info/sparse-checkout git sparse-checkout reapply with git sparse-checkout set --no-cone <space-separated list of patterns to insert into the .git/info/sparse-checkout file> With older git, you can replace those three commands with two: `git sparse-checkout init --no-cone && git sparse-checkout set <list of patterns>`. But that's sometimes not wanted since the init command sparsifies everything away except files in the toplevel directory, and then the second step restores all the files, and that two-step approach is really slow as it deletes and then restores a huge number of files from the working directory. > 3. Unfortunately, after executing reapply (btw it is very slow on this > 2m files * 16k patterns scenario: 30 mins), 'git add', and 'git add > --sparse' still hangs. 'git add --sparse' is still slow? That sounds like a bug I'd like to investigate. What's the particular timing you get for each of 'git add' and 'git add --sparse'? Are you giving it individual files (if so, how many?), or directories (how many files under those directories?), or globs? (This information will be helpful in my attempts to get a synthetic setup aiming to be similar to yours.) > 4. --cone is a big topic for us now, since 2.37.0 deprecates > --no-cone. We do have our own challenges to move away from --no-cone > (E.g. we use lots of file specifiers and/or exclusion patterns to > define our source code shape), which will be a huge amount of work, if > feasible. We've established a set of workflows based on --no-cone, > because of its merit of being capable of defining a fine-grained scm > shape. To be fair, --no-cone is deprecated as in discouraged due to various usability problems (including performance), but we currently have no plans to remove it from Git. I do heartily recommend migrating to --cone since it solves so many problems, but we'll still support --no-cone users as best we can. > 5. Back to this case, what we've experimented on are: > - Remove all files/*/! patterns from our shape definition, which > leave us with 14k directories (Obviously the scm shape no longe > matches, but just to proof of concept here) > - 'git sparse-checkout set <14k directories>' finishes fast Now I'm surprised. You said in the previous email that you were using git 2.34.2. In that version, --no-cone is the default, so this would still be using --no-cone mode. That either suggests you switched to v2.37 since your email and didn't include that detail here, or that the performance issue is actually with certain specific patterns. What version of git did you use here? And did you have either an explicit --cone or --no-cone when using the sparse-checkout set command? > - 'git add' finishes fast > As Victoria mentioned, I hope this --no-cone 'git add' performance > can be addressed because 'those performance gains can also be realized > in cone mode', as we saw here. Are we sure we saw that here? Could you verify by reporting: (a) what version of git were you using, and (b) does `git config --list | grep -i sparse` show both core.sparsecheckout and core.sparsecheckoutcone as being true after your do your sparse-checkout set? Elijah