On 6/4/2020 4:50 PM, Shaun Case wrote: > Hi Elijah, Derrick, > > On Thu, Jun 4, 2020 at 12:27 AM Elijah Newren <newren@xxxxxxxxx> wrote: >> >> Hi, >> >> On Wed, Jun 3, 2020 at 8:36 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: >>> >>> On 6/3/2020 12:37 AM, Elijah Newren wrote: >>>> I think it'd be more natural to run >>> >>>> git clone --filter=blob:none --sparse >>>> https://github.com/r-spacex/launch-timeline.git >>> >>>> in place of the combination of >>> >>>> git clone --filter=blob:none --no-checkout >>>> https://github.com/r-spacex/launch-timeline.git >>>> git sparse-checkout init --cone >>> >>>> since the --sparse flag was added just for this kind of case -- to do >>>> a clone but start with only a few things checked out. It's easier, is >>>> the route we're moving towards, and as a bonus also happens to work. >>> >>> Just one warning: the --sparse option in "git clone" does not currently >>> enable core.sparseCheckoutCone, so running "git sparse-checkout init --cone" >>> afterwards is a good idea, or else your "git sparse-checkout (set|add)" >>> commands will not behave the way you expect. > > Ok, great. My production script now does this: > > git clone --filter=blob:none --sparse <URL> > git sparse-checkout init --cone > git sparse-checkout add <PATH1> > git sparse-checkout add <PATH2> > [ ... ] > > ... and everything works as expected in 2.26.2 and 2.27.0. Perfection! > > See below for why I was doing it the other way. > >>> (I think that I will propose a change in behavior to make it do so during >>> this release cycle.) > > That sounds good to me. Could you also consider adding an error > message for git sparse-checkout init, if something unexpected follows? My > rationale is that my script initially contained this: > > git sparse-checkout init cone > > Note the missing "--" in front of cone. This failed silently, so I > thought I was in > cone mode, but wasn't. An error message would have helped. Your proposal, > having it happen automatically, would too. Yes, this is probably a bug that's my fault. The option parser should be more strict in the "init" case. >>>> A bit of a side note, or a few of them, but this command of yours is broken: >>>> git sparse-checkout set README.md >>>> because --cone mode means you are specifying *directories* that should >>>> be checked out. > > Thank you for pointing this out. This is a surprise. I'm migrating some large > Subversion repos and workspace setup scripts to git (thus the interest > in partial > cloning and sparse checkouts), and svn has the ability to sparse checkout > individual files as well as directories, so I expected it to work the > same way in git. > > Is this limitation a design decision, a technical limitation, a > planned feature? It > seems to be fairly popular in svn, and there is even more interest > for it in git: > > https://stackoverflow.com/questions/122107/checkout-one-file-from-subversion > https://stackoverflow.com/questions/2466735/how-to-sparsely-checkout-only-one-single-file-from-a-git-repository There are a couple reasons why this is tricky in Git. The first is due to the .gitignore syntax. The syntax allows exact matches for _directories_ using a trailing slash "/". For example, we can match everything in A/B/C with the pattern A/B/C/ This would match the files in A/B/C/ and its subdirectories, but will not match a file A/B/C.txt or A/B/C1/. There is no equivalent matching for files, so A/B/C _will_ match a file A/B/C and A/B/C.txt. Whether this matters to you or not depends on your file structure. The second is how the pattern-matching works for cone mode. Everything is based on prefix matches of directories, but also that if a directory is included, then all of its parent directories are included, but not recursively. All files contained directly in one of the parent directories are included automatically. Thus, if A/B/C/ is recursively added, then A/B/X.txt, A/Y.txt, and Z.txt all match in cone mode. This allows us some significant performance gains by allowing us to stop walking trees when we reach a directory that isn't one of these "parent" directories. We know that A/D/ is not in our list of "parents" so we don't need to look in there. What does this have to do with file matches? If we include A/B/C.txt as a file, then we will need to match all sibling files in A/B/! This is very similar to matching A/B/* or even A/B/C/*. This idea was brought up and debated shortly after 2.25.0 released [1] [1] https://lore.kernel.org/git/CADSBhNbbO=aq-Oo2MpzDMN2VAX4m6f9Jb-eCtVVX1NfWKE9zJw@xxxxxxxxxxxxxx/ >>>> Currently, this gives no error, it instead silently >>>> drops you back to non-cone mode, which seems bad to me. >>>> sparse-checkout should provide some kind of error -- or at very least >>>> a warning -- when you make that mistake. > > Agreed. > > I'm going to leave the following here for context, but please scroll down. > >>>> Now let's talk about the commit in question that changed behavior >>>> here. The point of sparse-checkout is never to switch branches or >>>> checkout new commits; all it does is update which paths are in the >>>> current working directory. A related point to this is it should never >>>> add or remove entries from the index and shouldn't change any hashes >>>> of files in the index. It used to violate this, at first via an >>>> implementation that was literally invoking `git read-tree -mu HEAD` in >>>> a subprocess, and then later using internal code equivalent to >>>> invoking that command in a subprocess. But by violating the >>>> leave-index-entries-alone mandate, it left folks who were in the >>>> middle of a rebase and wanted to update their sparse-checkout to >>>> include some more directories in their working tree in a precarious >>>> spot -- if they didn't update, then they didn't have the files >>>> necessary to build, and if they did forcibly update via `git read-tree >>>> -mu HEAD` then their staged changes would all get wiped out. I spent >>>> some quality time helping users recover their files and teaching them >>>> about the git storage model. >>>> >>>> So that brings us back to your original question. When you said >>>> --no-checkout, it means that there is no commit checked out and the >>>> index is empty. update_sparsity() is correctly toggling the >>>> SKIP_WORKTREE bits for the existing index entries that don't match the >>>> sparsity patterns, and it is correctly calling check_updates(). >>>> check_updates() is correctly checking for files currently in the index >>>> which have toggled to being needed in the current worktree so that it >>>> can issue downloads related to promisor packs. The problem is just >>>> that there aren't any index entries to begin with, so there are no >>>> SKIP_WORKTREE bits to update, and thus no files that need to be >>>> downloaded. >>>> >>>> It seems a bit risky to make sparse-checkout start doing >>>> checkout/switch behavior and adding entries to the index. There's a >>>> couple ways forward. One, we could decide this is a special edge or >>>> corner case where we allow it: if the index is completely empty, then >>>> there's no data to lose and thus we could make `git sparse-checkout >>>> init [--cone]` in that one case use the old 'read-tree -mu HEAD' >>>> logic. Alternatively, we could just require users to run 'git reset >>>> --hard' at the end of your script. >>>> >>>> Stolee: Thoughts? >>> >>> I agree that using "--sparse" instead of "--no-checkout" is the >>> best way forward for now, but I'll classify that as a workaround >>> and not necessarily the end of the conversation. >> >> Agreed. > > Agree too. I also agree that Elijah's changes are necessary, and will save me > a ton of headaches going forward. I don't want them backed out. > > So, it seems --sparse instead of --no-checkout is what I should have been doing > from the beginning, I had to fiddle a bit to get what I had working, > by following > some online resources and experimenting, and what I ended up with working > satisfactorily was --no-checkout, with 2.26.2. > > Why? > > Perhaps the messaging has inadvertently become a little muddled. I started > by working on getting partial cloning working; when I searched, these are the > resources I found and followed: > > https://about.gitlab.com/blog/2020/03/13/partial-clone-for-massive-repositories/ > https://github.blog/2020-01-13-highlights-from-git-2-25/ > https://stackoverflow.com/questions/4114887/is-it-possible-to-do-a-sparse-checkout-without-checking-out-the-whole-repository > https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/ > https://docs.gitlab.com/ce/topics/git/partial_clone.html > > The first four mention --no-checkout, but do not mention --sparse. The last one > actually gives proper guidance, but by the time I came across it, I had the > --no-checkout method working and I came away not even realizing --sparse > exists. Perhaps I'm the only one who went down this path. If not, the > --no-checkout case may be a bit more common than you might expect. There's a good reason for this. In v2.25.0, the "--sparse" option was present but broken. See [1] for the fix which was released in v2.26.0 as 47dbf10d8 (clone: fix --sparse option with URLs, 2020-01-24). [1] https://lore.kernel.org/git/4991a51f6d5d840eaa3bb830e68f1530c2ee08e4.1579900782.git.gitgitgadget@xxxxxxxxx/ >>> In general, the commit in question is doing something extremely >>> valuable for common situations, like rebase that you mention. >>> I also think that this change in behavior is warranted by the >>> clear warning placed at the top of the docs [1]: >>> >>> THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE >>> BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE- >>> CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. >>> >>> [1] https://git-scm.com/docs/git-sparse-checkout#_description > > Got it. Consider your butts covered. :) :D Thanks, -Stolee