Re: Git and sparse-checkout on large monorepos - hiding irrelevant changes for a sparse-checkout specification?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 20, 2020 at 2:21 PM Tao Klerks <tao@xxxxxxxxxx> wrote:
>
> Hi,
>
> I posted an "Is this possible?" question on stackoverflow
> (https://stackoverflow.com/q/61326025/74296) and was pointed here.
>
> I understand from recent updates that there is increasing built-in
> support for large files and large repos, between some of the older
> capabilities (sparse checkout in general and shallow clone), and the
> newer ones (partial-clone and git-sparse-checkout).
>
> I'm playing with a large repo, and finding some "rough edges" around
> large diffs (eg 200,000 files "added" in the "initial" commits of
> shallow clones).
>
> I was hoping these could be smoothed out when using sparse checkout
> (where each user would only see say 30,000 of those 200,000 files),
> but can't figure out a way to easily & consistently apply the
> .git/info/sparse-checkout specification to tools like git-diff and
> git-log (across many users with some semblance of consistency).
>
> Is this something that is or is expected to be supported at some point?

Yes, we would like to support this at some point.  See
https://lore.kernel.org/git/xmqq7dz938sc.fsf@xxxxxxxxxxxxxxxxxxxxxx/
and a bunch of other emails from that thread.  You may need to set a
config setting, though (see e.g.
https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@xxxxxxxxxxxxxx/
from that thread).

Also, there is no plan at all for when this will happen.  You'll note
those links are kind of recent.  These issues have also come up
before, but I'm too lazy to dig up the links to the other threads.

> While I'm asking, I have two less-important questions:
>
> 1) Are there any plans to support a filter along the lines of "keep
> blobs used for commits since date X handy"? I know I can do a shallow
> clone, then turn on filtering/promisors, and then unshallow, but then
> later fetches don't bring in binaries - a mode that provides this
> "full commit history but recent blobs only" might be nice? (I imagine
> that's probably non-trivial, because the filters are probably based on
> properties of the blobs themselves... but one can dream?)

Given the context before this in your email, could you clarify what
you are asking?  In particular, are you really asking for all blobs
since date X, or for blobs within your sparse cone (going back to
beginning of history), or blobs within your sparse cone since date X?

I personally don't think doing anything with shallow clones other than
avoiding breaking existing usecases has any value.  So, I'll focus on
partial clones.

I've been trying to win some mindshare for the second of those options
(having the ability to specify sparsity cones to clone/fetch and have
it respect those and only download blobs touching those paths, plus
all commits and maybe all trees), and perhaps the others could be
added on top.  I'm planning to help out with this, after my merge
work, but who knows when that finishes.

> 2) Is there a target date for when git-sparse-checkout will become
> non-experimental?

We're more feature based than date based.  I was one of the ones
asking that we put that loud this-is-experimental warning in the docs,
and in particular mentioning that other commands (diff, log, grep,
clone, fetch, etc.) could change in the presence of sparse-checkouts
precisely because I want to see some of the above things fixed and
even have some ideas for merge/rebase/cherry-pick in this area.
You're likely to see some commands start gaining support to work
better in a sparse-checkout (e.g. Matheus posted some patches to make
grep better respect those), and more commands slowly gain it over
time.  Once enough have it and we've worked out the known bugs with
sparse-checkouts (we have some significant patches in 'next' that 2.26
users haven't seen yet), then we'll discuss when it's time to remove
the experimental warning.

> Thanks for any help, my apologies if my questions are too forward.

Sorry that the answer amounts to "we don't have that yet", but the
things you are asking for are things we've been discussing and moving
towards.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux