Re: [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matheus,

On Wed, Jun 10, 2020 at 2:15 PM Matheus Tavares Bernardino
<matheus.bernardino@xxxxxx> wrote:
>
> On Tue, Jun 2, 2020 at 11:40 PM Elijah Newren <newren@xxxxxxxxx> wrote:
> >
> > On Sun, May 31, 2020 at 9:46 PM Matheus Tavares Bernardino
> > <matheus.bernardino@xxxxxx> wrote:
> > >
> >
> > Moving it to grep's manpage seems ideal to me.  grep's behavior should
> > be defined in grep's manual.
> >
> > > sparse.restrictCmds::
> > > See complete definition in linkgit:git-config[1]. In grep, the
> > > restriction takes effect in three cases: with --cached; when a
> > > commit-ish is given; when searching a working tree where some paths
> > > excluded by the sparsity patterns are present (e.g. manually created
> > > paths or not removed submodules).
> >
> > That looks more than a little confusing.  Could this definition be
> > something more like "See base definition in linkgit:git-config[1].
> > grep honors sparse.restrictCmds by limiting searches to the sparsity
> > paths in three cases: when searching the working tree, when searching
> > the index with --cached, or when searching a specified commit"
>
> Yes, this looks better, thanks. I would only add a brief explanation
> on what we mean by limiting the search in the working tree case.

Possibly, but I think it would be easy to go overboard here.

> Since
> the working tree should already contain only the sparse paths (in most
> cases), I think this sentence may sound a little confusing without
> some explanation.

That's an interesting flag.  I'm curious, though, would they be
confused by it, or would it just seem immediately obvious and almost
not worth mentioning?  In other words, would they think "Well, if you
use sparse-checkout to get just a subset of files checked out, it
totally makes sense that grep would be limited in that case.  Why do
they even need to mention it -- just for completeness, I guess?"

And even if not all users think that way, would a large percentage of
users around them think that way and point out the obviousness of the
docs?

If not, maybe we just add a "(obviously)" comment right after "working tree"?

> Even further, some users might expect that `git -c
> sparse.restrictCmds=false grep $pattern` would restore the previous
> behavior of falling back to the cache for non-present entries, which
> is not true.

10 years from now, I don't want our docs to consist of a long
explanation of all the bugs that existed in various ancient versions
of git and how modern behavior differs from each previous iteration.
There are times when it's worth calling out bugs in prior versions to
bring it to the attention of our users, but I don't see how this is
one of them.  The previous behavior was just outright buggy and
inconsistent, and from my viewpoint, was also a regression.  I think
it should have been reverted regardless of your series, though
skip_worktree stuff was dormant and went unused for a really long
time.

Also, this is a special area of git where focusing too much on
backward compatibility might actually be detrimental.  Backward
compatibility is a really good goal to keep in mind in general, but
the SKIP_WORKTREE usability was traditionally really, really bad -- so
much so that outright replacing was contemplated by its author[A], and
we placed a HUGE ALL CAPS DISCLAIMER in the documentation of
sparse-checkout about how users should expect the behavior of commands
to change[B].  So, unlike other areas of git, we should focus on
getting sparse-checkout behavior right more than on bug compatibility
with previous code and long migration stories.  Given the context of
such disclaimers and changes, the idea of trying to document those
changes makes me think that in the not too distant future we would
have the equivalent of the following humorous driving directions from
the era before smartphones: "To get to Joe's place, you turn right on
the first road after where Billy's Barn burned down 5 years ago..."
(when the burned Barn was cleared out 4 years ago and there's no
indication of where it once was)

[A] https://lore.kernel.org/git/CABPp-BGE-m_UFfUt_moXG-YR=ZW8hMzMwraD7fkFV-+sEHw36w@xxxxxxxxxxxxxx/
[B] https://git-scm.com/docs/git-sparse-checkout#_description

> In particular, I would like to emphasize that the use for
> `sparse.restrictCmds=false` in the working tree case, is for
> situations like the one you described in [1]:
>
> * uses sparse-checkout to remove a bunch of files/directories they
> don't care about
> * creates a new file that happens to have the same name as an
> (unfortunately) generically worded filename that exists in the index
> (but is marked SKIP_WORKTREE and had previously been removed)
>
> In this situation, grep would ignore the said file by default, but
> search it with `sparse.restrictCmds=false`.

I think this is such a weird and unusual case that I'm not sure it
merits mentioning in the docs.

But if others disagree and think this case is worth mentioning in the
docs, then it shouldn't just be mentioned in "git grep".  All affected
manpages should be updated to discuss how they handle this obscure
corner case.  For example, `git diff` and `git status` just ignore
these files and do not print out any information about them.  So it's
kind of like these files are ignored...but even `git status --ignored`
won't show anything about such files.

Anyway, I think this is a pretty obscure case whose discussion would
dilute the value of the manual in teaching people the basics of
commands.

> So what do you think of the following:
>
> sparse.restrictCmds::
> See base definition in linkgit:git-config[1]. grep honors
> sparse.restrictCmds by limiting searches to the sparsity paths in
> three cases: when searching the working tree, when searching the index
> with --cached, and when searching a specified commit.

Good up to here.  I think I'd like to use just this text as-is (or
maybe with the "(obviously)" addition) and then see if we get feedback
that we need clarifications, because I'm worried our attempts at
clarifying might backfire.  For example...

> Note: when this
> option is set to true (default), the working tree search will ignore
> paths that are present despite not matching the sparsity patterns.

You've run into the same problem Stolee and I did by trying to provide
details about one case, but overlooking others.  ;-)  This "Note:"
statement is not correct; there's a couple cases it gets wrong:

merge/rebase/cherry-pick can unset the SKIP_WORKTREE bit even for
paths that do not match the sparsity patterns in order to be able to
materialize a file and show conflicts.  In fact, they are allowed to
unset the bit for other files and materialize them too (see
https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/).
Such paths, despite not matching the sparsity patterns, will not have
the SKIP_WORKTREE bit set.  And it is the SKIP_WORKTREE bit, rather
than the sparsity patterns, that git-grep uses for deciding which
files in the working tree to search.

Also, if someone runs sparse-checkout init/set, and sparse-checkout
would normally remove some file but notices that the file has local
modifications, then sparse-checkout will avoid removing the file AND
will avoid setting the SKIP_WORKTREE bit on that file.  See commit
681c637b4a ("unpack-trees: failure to set SKIP_WORKTREE bits always
just a warning", 2020-03-27)

> This can happen, for example, if you create a new file in a path that
> was previously removed by git-sparse-checkout.

This is that obscure corner case discussed above.

> Or if you don't
> deinitialize a submodule that is excluded by the sparsity patterns
> (thus remaining in the working copy, anyway).

This case requires more thought.  If a submodule doesn't match the
sparsity patterns, we already said elsewhere that sparse-checkout
should not remove the submodule (since doing so would risk data loss).
But do we set the SKIP_WORKTREE bit for it?  Generally,
sparse-checkout avoids removing files with modifications, and if it
doesn't remove them it also doesn't set the SKIP_WORKTREE bit.  For
consistency, should sparse-checkout not set SKIP_WORKTREE for
initialized submodules?

If we don't set the SKIP_WORKTREE bit for initialized submodules, then
we don't actually have a second different case to mention here.

Granted, that's more an issue for `sparse-checkout` than `grep`.


Hope that all helps.  Let me know if it doesn't, if you disagree with
any parts, or some parts aren't clear.
Elijah



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux