Re: [PATCH] sparse-checkout.txt: new document with sparse-checkout directions

ZheNing Hu <adlternative@xxxxxxxxx> · Sat, 15 Oct 2022 22:49:27 +0800

Elijah Newren <newren@xxxxxxxxx> 于2022年10月15日周六 12:38写道：
>
> On Fri, Oct 14, 2022 at 7:17 PM ZheNing Hu <adlternative@xxxxxxxxx> wrote:
> >
> > Elijah Newren <newren@xxxxxxxxx> 于2022年10月6日周四 15:53写道：
> > >
> > > On Fri, Sep 30, 2022 at 2:54 AM ZheNing Hu <adlternative@xxxxxxxxx> wrote:
> > > >
> > > > Elijah Newren <newren@xxxxxxxxx> 于2022年9月28日周三 13:38写道：
> > > > >
> [...]
> > > As an example, the repository where we first applied sparse-checkouts
> > > to (and which had the complicated dependencies) does not use partial
> > > clones or a sparse-index.   While partial clone and sparse-index might
> > > help a little, the .git directory for a full clone is merely 2G, and
> > > there are less than 100K entries in the index.  However,
> > > sparse-checkout helps out a lot.
> >
> > Yes, you make a good explanation here that we don't necessarily need
> > to apply all these kinds of features. But I still feel a little confuse: Where
> > does the time savings come from? Is it saved by the time reduction of
> > git checkout? Or is it the reduction of some unnecessary working tree scans
> > during test/build time?
>
> It is neither git checkout time, nor tree scans; it's the ability to
> avoid building larging parts of the project coupled with the
> significantly better responsiveness of IDEs when project scope is
> limited.  When directories are entirely missing, we don't need to
> build any of the code in those directories and can instead just use
> already built artifacts from the most recent point in history that has
> been built on our continuous integration infrastructure.  (Note: our
> sparsification tool will keep any modules/directories where there have
> been modifications since the most recent upstream commit that has been
> built, so we don't risk getting a wrong build via this strategy.)
>

So these users are just building/testing on a few projects and save time
from building/testing on some other projects. This is reasonable.

> [...]
> > > > 1. mount the large git repo on the server to local.
> > > > 2. just ssh to a remote server to run integration tests.
> > > > 3. use an external tool to run integration tests on the remote server.
> > >
> > > Are you suggesting #1 as a way for just handling the git history, or
> > > also for handling the worktree with some kind of virtual file system
> > > where not all files are actually written locally?  If you're only
> > > talking about the history, then you're kind of going on a tangent
> > > unrelated to this document.  If you're talking about worktrees and
> > > virtual file systems, then Git proper doesn't have anything of the
> > > sort currently.  There are at least two solutions in this space --
> > > Microsoft's Git-VFS (which I think they are phasing out) and Google's
> > > similar virtual file system -- but I'm not currently particularly
> > > interested in either one.
> > >
> >
> > Here I mean git nfs, or some kind of git virtual file system, or some
> > git workspace, I don't really understand why they are now
> > phasing out?
>
> You'd have to ask them, or read their comments on it.  I think they
> believe sparse-checkout with a normal file system is or will be better
> than the behavior they are getting from their virtual file system (and
> they've put a lot of really good work behind making sure that is the
> case).
>

Okay.

> [...]
> > Some users may really want to focus only on their subprojects, so I think
> > "git log -p" shouldn't show files that don't satisfy the
> > sparse-checkout patterns,
> > and "git grep" too. But some users may need to search something globally,
> > and I think those people are in the minority, so maybe there should be a
> > "git log -p --scrope=all" or "git grep --scrope=all" for them.
>
> Good to know you're in the "Behavior A" camp and we've got another
> vote for implementing things in that direction.  A couple of small
> points, though:
>   * It's --scope rather than --scrope.  ;-)
>   * I have to disagree here slightly about people using a --scope=all
> flag -- I don't think users should have to specify it with every grep
> or log invocation.  Users in the "Behavior B" camp would want
> `--scope=all` behavior for nearly every grep and log -p invocation
> they make; it's annoying and unfair to force them to spell it out
> every time.  So, I think we need a configuration option.
>

Fine, this configuration looks like it can balance the needs of both camps.

Thanks,
ZheNing Hu