Re: [PATCH v1 1/2] builtin/grep.c: add --sparse option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi reviewrs,

I came back from busying with relocation :)

On 8/17/2022 10:12 PM, Derrick Stolee wrote:
> On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
>> Add a --sparse option to `git-grep`. This option is mainly used to:
>>
>> If searching in the index (using --cached):
>>
>> With --sparse, proceed the action when the current cache_entry is
>
> This phrasing is awkward. It might be better to reframe to describe the
> _why_ before the _what_
>
>   When the '--cached' option is used with the 'git grep' command, the
>   search is limited to the blobs found in the index, not in the worktree.
>   If the user has enabled sparse-checkout, this might present more results >   than they would like, since the files outside of the sparse-checkout are
>   unlikely to be important to them.
>
>   Change the default behavior of 'git grep' to focus on the files within
>   the sparse-checkout definition. To enable the previous behavior, add a
>   '--sparse' option to 'git grep' that triggers the old behavior that
>   inspects paths outside of the sparse-checkout definition when paired
>   with the '--cached' option.

Good suggestion!

> Or something like that. The documentation updates will also help clarify
> what happens when '--cached' is not included. I assume '--sparse' is
> ignored, but perhaps it _could_ allow looking at the cached files outside
> the sparse-checkout definition, this could make the simpler invocation of
> 'git grep --sparse <pattern>' be the way that users can search after their
> attempt to search the worktree failed.

This simpler version was in my earlier local branch, but later I
decided not to go with it. I found the difference between these two
approaches, is that "--cached --sparse" is more correct in terms of
how Git actually works (because sparsity is a concept in the index);
and "--sparse" is more comfortable for the end user.

I found the former one better here, because it is more self-explanatory,
and thus more info for the user, i.e. "you are now looking at the
index, and Git will also consider files outside of sparse definition."

To be honest, I don't know which one is "better", but I think I'll
keep the current implementation unless something more convincing shows
up later.

>> marked with SKIP_WORKTREE bit (the default is to skip this kind of
>> entry). Before this patch, --cached itself can realize this action.
>> Adding --sparse here grants the user finer control over sparse
>> entries. If the user only wants to peak into the index without
>
> s/peak/peek/
>
>> caring about sparse entries, --cached should suffice; if the user
>> wants to peak into the index _and_ cares about sparse entries,
>> combining --sparse with --cached can address this need.
>>
>> Suggested-by: Victoria Dye <vdye@xxxxxxxxxx>
>> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@xxxxxxxxx>
>> ---
>>  builtin/grep.c                  | 10 +++++++++-
>>  t/t7817-grep-sparse-checkout.sh | 12 ++++++------
>>  2 files changed, 15 insertions(+), 7 deletions(-)
>
> You mentioned in Slack that you missed the documentation of the --sparse
> option. Just pointing it out here so we don't forget.

Will do.

>>
>> diff --git a/builtin/grep.c b/builtin/grep.c
>> index e6bcdf860c..61402e8084 100644
>> --- a/builtin/grep.c
>> +++ b/builtin/grep.c
>> @@ -96,6 +96,8 @@ static pthread_cond_t cond_result;
>>
>>  static int skip_first_line;
>>
>> +static int grep_sparse = 0;
>> +
>
> I initially thought it might be good to not define an additional global,
> but there are many defined in this file outside of the context and they
> are spread out with extra whitespace like this.
>
>>  static void add_work(struct grep_opt *opt, struct grep_source *gs)
>>  {
>>      if (opt->binary != GREP_BINARY_TEXT)
>> @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt,
>>      for (nr = 0; nr < repo->index->cache_nr; nr++) {
>>          const struct cache_entry *ce = repo->index->cache[nr];
>>
>> -        if (!cached && ce_skip_worktree(ce))
>
> This logic would skip files marked with SKIP_WORKTREE _unless_ --cached
> was provided.
>
>> +        /*
>> +         * If ce is a SKIP_WORKTREE entry, look into it when both
>> +         * --sparse and --cached are given.
>> +         */
>> +        if (!(grep_sparse && cached) && ce_skip_worktree(ce))
>>              continue;
>
> The logic of this if statement is backwards from the comment because a
> true statement means "skip the entry" _not_ "look into it".
>
>     /*
>      * Skip entries with SKIP_WORKTREE unless both --sparse and
>      * --cached are given.
>      */

Got it.

> But again, we might want to consider this alternative:
>
>     /*
>      * Skip entries with SKIP_WORKTREE unless --sparse is given.
>      */
>     if (!grep_sparse && ce_skip_worktree(ce))
>         continue;
>
> This will require further changes below, specifically this bit:
>
>             /*
>              * If CE_VALID is on, we assume worktree file and its
>              * cache entry are identical, even if worktree file has
>              * been modified, so use cache version instead
>              */
>             if (cached || (ce->ce_flags & CE_VALID)) {
>                 if (ce_stage(ce) || ce_intent_to_add(ce))
>                     continue;
>                 hit |= grep_oid(opt, &ce->oid, name.buf,
>                          0, name.buf);
>             } else {
>
> We need to activate this grep_oid() call also when ce_skip_worktree(c) is
> true. That is, if we want 'git grep --sparse' to extend the search beyond
> the worktree and into the sparse entries.
>
>>
>>          strbuf_setlen(&name, name_base_len);
>> @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>>                 PARSE_OPT_NOCOMPLETE),
>>          OPT_INTEGER('m', "max-count", &opt.max_count,
>>              N_("maximum number of results per file")),
>> +        OPT_BOOL(0, "sparse", &grep_sparse,
>> +             N_("search sparse contents and expand sparse index")),
>
> This "and expand sparse index" is an internal implementation detail, not a
> heplful item for the help text. Instead, perhaps:
>
>     "search the contents of files outside the sparse-checkout definition"

Sounds good!

> (Also, while the sparse index is being expanded right now, I would expect
> to not expand the sparse index by the end of the series.)
>
>> -test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' ' >> +test_expect_success 'grep --cached and --sparse searches entries with the SKIP_WORKTREE bit' '
>>      cat >expect <<-EOF &&
>>      a:text
>>      b:text
>>      dir/c:text
>>      EOF
>> -    git grep --cached "text" >actual &&
>> +    git grep --cached --sparse "text" >actual &&
>>      test_cmp expect actual
>>  '
>
> Please add a test that demonstrates the change of behavior when only --cached
> is provided, not --sparse.

Sure!

> (If you take my suggestion to allow 'git grep --sparse' to do something
> different, then also add a test for that case.)
>
>>
>> @@ -143,7 +143,7 @@ test_expect_success 'grep --recurse-submodules honors sparse checkout in submodu
>>      test_cmp expect actual
>>  '
>>
>> -test_expect_success 'grep --recurse-submodules --cached searches entries with the SKIP_WORKTREE bit' ' >> +test_expect_success 'grep --recurse-submodules --cached and --sparse searches entries with the SKIP_WORKTREE bit' '
>>      cat >expect <<-EOF &&
>>      a:text
>>      b:text
>> @@ -152,7 +152,7 @@ test_expect_success 'grep --recurse-submodules --cached searches entries with th
>>      sub/B/b:text
>>      sub2/a:text
>>      EOF
>> -    git grep --recurse-submodules --cached "text" >actual &&
>> +    git grep --recurse-submodules --cached --sparse "text" >actual &&
>>      test_cmp expect actual
>>  '
>> @@ -166,7 +166,7 @@ test_expect_success 'working tree grep does not search the index with CE_VALID a
>>      test_cmp expect actual
>>  '
>>
>> -test_expect_success 'grep --cached searches index entries with both CE_VALID and SKIP_WORKTREE' ' >> +test_expect_success 'grep --cached and --sparse searches index entries with both CE_VALID and SKIP_WORKTREE' '
>>      cat >expect <<-EOF &&
>>      a:text
>>      b:text
>> @@ -174,7 +174,7 @@ test_expect_success 'grep --cached searches index entries with both CE_VALID and
>>      EOF
>>      test_when_finished "git update-index --no-assume-unchanged b" &&
>>      git update-index --assume-unchanged b &&
>> -    git grep --cached text >actual &&
>> +    git grep --cached --sparse text >actual &&
>>      test_cmp expect actual
>>  '
>
> Same with these two tests. Add additional commands that show the change of
> behavior when only using '--cached'.

--
Thanks,
Shaoxuan




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux