Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 21, 2020 at 2:52 PM Elijah Newren <newren@xxxxxxxxx> wrote:
>
> On Thu, May 21, 2020 at 10:36 AM Matheus Tavares Bernardino
> <matheus.bernardino@xxxxxx> wrote:
> >
> > On Thu, May 21, 2020 at 4:26 AM Elijah Newren <newren@xxxxxxxxx> wrote:
> > >
> > > On Tue, May 12, 2020 at 5:17 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> > > >
> > > > Matheus Tavares Bernardino <matheus.bernardino@xxxxxx> writes:
> > > >
> > > > > The idea behind not skipping gitlinks here was to be compliant with
> > > > > what we have in the working tree. In 4fd683b ("sparse-checkout:
> > > > > document interactions with submodules"), we decided that, if the
> > > > > sparse-checkout patterns exclude a submodule, the submodule would
> > > > > still appear in the working tree. The purpose was to keep these
> > > > > features (submodules and sparse-checkout) independent. Along the same
> > > > > lines, I think we should always recurse into initialized submodules in
> > >
> > > Sorry if I missed it in the code, but do you check whether the
> > > submodule is initialized before descending into it, or do you descend
> > > into it based on it just being a submodule?
> >
> > We only descend if the submodule is initialized. The new code in this
> > patch doesn't do this check, but it is already implemented in
> > grep_submodule() (which is called by grep_tree() and grep_cache() when
> > a submodule is found).
>
> Good to know.  To up the ante a bit: What if another branch has
> directory that doesn't exist in HEAD or the current checkout, and
> within that directory is a submodule.  Would it be recursed into?

In this case, `git grep --recurse-submodules <pattern> $branch` will
recurse into the submodule, but only if it has already been
initialized. I.e. if we have checked out to $branch, ran `git
submodule init` and then checked out back.

> What if it matched the sparsity paths?  (Is it even possible to
> recurse into it?)

That's a great question. The idea that I tried to implement is to
always recurse into _initialized_ submodules (even the ones excluded
by the superproject's sparsity patterns) and, then, follow their own
sparsity patterns inside. I'm not necessarily in favor (or against)
this behavior, but this seemed to be the most compatible way with the
design we describe in our docs:

"If your sparse-checkout patterns exclude an initialized submodule,
then that submodule will still appear in your working directory." (in
git-sparse-checkout.txt)

So, back to the original question, if you run `git grep
--recurse-submodules <pattern> $branch` and $branch contains a
submodule which was previously initialized, git-grep _would_ recurse
into it, even if it (or its parent dir) was excluded. However, your
question helped me notice an inconsistency in my patch: the behavior I
just described is working for the full pattern set, but not in cone
mode. That's because, in cone mode, we can mark the whole submodule's
parent dir as excluded. Then, path_matches_pattern_list() will return
NOT_MATCHED for the parent dir and we won't recurse into it, so we
won't even get to the submodule's path to discover that it refers to a
gitlink.

Therefore, if we decide to keep the behavior of always recursing into
submodules, we will need some extra work for the cone mode. I.e.
grep_tree() will have to check if NOT_MATCHED directories contain
submodules before discarding them, and recurse only into the
submodules if so. As for the implementation, the first idea that came
to my mind was to list the submodules' pathnames and do prefix
matching for each submodule and NOT_MATCHED dir. But the places I've
seen such submodule listings in the code base so far [1] seem to work
only in the current branch. My second idea was to continue the tree
walk when we hit NOT_MATCHED dir entries, but not doing any work, just
looking for possible gitlinks to recurse into. I'm not sure if that
could negatively affect the execution time, though.

Does this seem like a good approach? Or is there another solution that
I have not considered? Or even further, should we choose to skip the
submodules in excluded paths? My only concern in this case is that it
would be contrary to the design in git-sparse-checkout.txt. And the
working tree grep and cached grep would differ even on a clean working
tree.

[1]: builtin/submodule--helper.c:module_list_compute() and
submodule-config.c:config_from_gitmodules()



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux