On Mon, Apr 20, 2020 at 2:21 PM Tao Klerks <tao@xxxxxxxxxx> wrote: > > Hi, > > I posted an "Is this possible?" question on stackoverflow > (https://stackoverflow.com/q/61326025/74296) and was pointed here. > > I understand from recent updates that there is increasing built-in > support for large files and large repos, between some of the older > capabilities (sparse checkout in general and shallow clone), and the > newer ones (partial-clone and git-sparse-checkout). > > I'm playing with a large repo, and finding some "rough edges" around > large diffs (eg 200,000 files "added" in the "initial" commits of > shallow clones). > > I was hoping these could be smoothed out when using sparse checkout > (where each user would only see say 30,000 of those 200,000 files), > but can't figure out a way to easily & consistently apply the > .git/info/sparse-checkout specification to tools like git-diff and > git-log (across many users with some semblance of consistency). > > Is this something that is or is expected to be supported at some point? Yes, we would like to support this at some point. See https://lore.kernel.org/git/xmqq7dz938sc.fsf@xxxxxxxxxxxxxxxxxxxxxx/ and a bunch of other emails from that thread. You may need to set a config setting, though (see e.g. https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@xxxxxxxxxxxxxx/ from that thread). Also, there is no plan at all for when this will happen. You'll note those links are kind of recent. These issues have also come up before, but I'm too lazy to dig up the links to the other threads. > While I'm asking, I have two less-important questions: > > 1) Are there any plans to support a filter along the lines of "keep > blobs used for commits since date X handy"? I know I can do a shallow > clone, then turn on filtering/promisors, and then unshallow, but then > later fetches don't bring in binaries - a mode that provides this > "full commit history but recent blobs only" might be nice? (I imagine > that's probably non-trivial, because the filters are probably based on > properties of the blobs themselves... but one can dream?) Given the context before this in your email, could you clarify what you are asking? In particular, are you really asking for all blobs since date X, or for blobs within your sparse cone (going back to beginning of history), or blobs within your sparse cone since date X? I personally don't think doing anything with shallow clones other than avoiding breaking existing usecases has any value. So, I'll focus on partial clones. I've been trying to win some mindshare for the second of those options (having the ability to specify sparsity cones to clone/fetch and have it respect those and only download blobs touching those paths, plus all commits and maybe all trees), and perhaps the others could be added on top. I'm planning to help out with this, after my merge work, but who knows when that finishes. > 2) Is there a target date for when git-sparse-checkout will become > non-experimental? We're more feature based than date based. I was one of the ones asking that we put that loud this-is-experimental warning in the docs, and in particular mentioning that other commands (diff, log, grep, clone, fetch, etc.) could change in the presence of sparse-checkouts precisely because I want to see some of the above things fixed and even have some ideas for merge/rebase/cherry-pick in this area. You're likely to see some commands start gaining support to work better in a sparse-checkout (e.g. Matheus posted some patches to make grep better respect those), and more commands slowly gain it over time. Once enough have it and we've worked out the known bugs with sparse-checkouts (we have some significant patches in 'next' that 2.26 users haven't seen yet), then we'll discuss when it's time to remove the experimental warning. > Thanks for any help, my apologies if my questions are too forward. Sorry that the answer amounts to "we don't have that yet", but the things you are asking for are things we've been discussing and moving towards.