On Mon, Sep 27 2021, Matheus Tavares wrote: > On Mon, Sep 27, 2021 at 9:09 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: >> >> >> On Mon, Aug 16 2021, Jonathan Tan wrote: >> >> > Record the repository whenever an OID grep source is created, and teach >> > the worker threads to explicitly provide the repository when accessing >> > objects. >> > [...] >> > diff --git a/grep.h b/grep.h >> > index 480b3f5bba..128007db65 100644 >> > --- a/grep.h >> > +++ b/grep.h >> > @@ -120,7 +120,20 @@ struct grep_opt { >> > struct grep_pat *header_list; >> > struct grep_pat **header_tail; >> > struct grep_expr *pattern_expression; >> > + >> > + /* >> > + * NEEDSWORK: See if we can remove this field, because the repository >> > + * should probably be per-source. That is, grep.c functions using this >> > + * field should probably start using "repo" in "struct grep_source" >> > + * instead. >> > + * >> > + * This is potentially the cause of at least one bug - "git grep" >> > + * ignoring the textconv attributes from submodules. See [1] for more >> > + * information. >> > + * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@xxxxxxxxxxxxxx/ >> > + */ >> > struct repository *repo; >> > + >> >> I ran into this comment and read the linked E-Mail, and then the >> downthread >> https://lore.kernel.org/git/CAHd-oW6uG1fap-T4UF17bJmjoHAqWCDq9KbY+_8a3cEnnfATxg@xxxxxxxxxxxxxx/; >> >> Given Matheus's "I've somehow missed this guard and the..." there I'm >> not quite sure what/if we should be doing here & what this comment is >> recommending? I.e. do we still need to adjust the call chains as noted >> in the E-Mail the comment links to, or not? > > I think we should still adjust the call chains, yes. The downthread > message you mentioned is kind of a tangent about performance, where > Junio helped me understand something I had previously missed in the > code, regarding the persistence of the attributes stack. > > But the issue that started the thread was about a correctness problem: > the superproject textconv attributes are being used on submodules' > files when running `git grep` with `--recurse-submodules --textconv`. > The three cases to consider are: > > - .gitattributes from the working tree > - .gitattributes from the index > - .git/info/attributes > > On all these cases, the superproject attributes are being used on the > submodule. Additionally, if the superproject does not define any > attribute, the submodule attributes are being ignored in all cases > except by the first one (but that is only because the code sees the > .gitattributes file on the submodule as if it were a "regular" > subdirectory of the surperproject. So the submodule's .gitattribures > takes higher precedence when evaluating the attributes for files in > that directory). > > Another issue is that the textconv cache is always saved to (and read > from) the superproject gitdir, even for submodules' files. > > Here are some test cases that demonstrate these issues: > > -- snipsnap -- > diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh > index 3172f5b936..d01a3bc5d8 100755 > --- a/t/t7814-grep-recurse-submodules.sh > +++ b/t/t7814-grep-recurse-submodules.sh > @@ -441,4 +441,104 @@ test_expect_success 'grep --recurse-submodules with --cached ignores worktree mo > test_must_fail git grep --recurse-submodules --cached "A modified line in submodule" >actual 2>&1 && > test_must_be_empty actual > ' > + > +test_expect_failure 'grep --textconv: superproject .gitattributes does not affect submodules' ' > + reset_and_clean && > + test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" && > + echo "a diff=d2x" >.gitattributes && > + > + cat >expect <<-\EOF && > + a:(1|2)x(3|4) > + EOF > + git grep --textconv --recurse-submodules x >actual && > + test_cmp expect actual > +' > + > +test_expect_failure 'grep --textconv: superproject .gitattributes (from index) does not affect submodules' ' > + reset_and_clean && > + test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" && > + echo "a diff=d2x" >.gitattributes && > + git add .gitattributes && > + rm .gitattributes && > + > + cat >expect <<-\EOF && > + a:(1|2)x(3|4) > + EOF > + git grep --textconv --recurse-submodules x >actual && > + test_cmp expect actual > +' > + > +test_expect_failure 'grep --textconv: superproject .git/info/attributes does not affect submodules' ' > + reset_and_clean && > + test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" && > + super_attr="$(git rev-parse --path-format=relative --git-path info/attributes)" && > + test_when_finished rm -f "$super_attr" && > + echo "a diff=d2x" >"$super_attr" && > + > + cat >expect <<-\EOF && > + a:(1|2)x(3|4) > + EOF > + git grep --textconv --recurse-submodules x >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep --textconv corectly reads submodule .gitattributes' ' > + reset_and_clean && > + test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" && > + echo "a diff=d2x" >submodule/.gitattributes && > + > + cat >expect <<-\EOF && > + submodule/a:(1|2)x(3|4) > + EOF > + git grep --textconv --recurse-submodules x >actual && > + test_cmp expect actual > +' > + > +test_expect_failure 'grep --textconv corectly reads submodule .gitattributes (from index)' ' > + reset_and_clean && > + test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" && > + echo "a diff=d2x" >submodule/.gitattributes && > + git -C submodule add .gitattributes && > + rm submodule/.gitattributes && > + > + cat >expect <<-\EOF && > + submodule/a:(1|2)x(3|4) > + EOF > + git grep --textconv --recurse-submodules x >actual && > + test_cmp expect actual > +' > + > +test_expect_failure 'grep --textconv corectly reads submodule .git/info/attributes' ' > + reset_and_clean && > + test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" && > + > + # Workaround: we use --path-format=relative because the absolute path > + # contains whitespaces and that seems to confuse test_when_finished > + # > + submodule_attr="submodule/$(git -C submodule rev-parse --path-format=relative --git-path info/attributes)" && > + test_when_finished rm -f "$submodule_attr" && > + echo "a diff=d2x" >"$submodule_attr" && > + > + cat >expect <<-\EOF && > + submodule/a:(1|2)x(3|4) > + EOF > + git grep --textconv --recurse-submodules x >actual && > + test_cmp expect actual > +' > + > +test_expect_failure 'grep saves textconv cache in the appropriated repository' ' > + reset_and_clean && > + test_config_global diff.d2x_cached.textconv "sed -e \"s/d/x/\"" && > + test_config_global diff.d2x_cached.cachetextconv true && > + echo "a diff=d2x_cached" >submodule/.gitattributes && > + > + # Note: we only read/write to the textconv cache when grepping from an > + # OID as the working tree file might have modifications. That is why > + # we use --cached here. > + # > + git grep --textconv --cached --recurse-submodules x && > + test_path_is_missing "$(git rev-parse --git-path refs/notes/textconv/d2x_cached)" && > + test_path_is_file "$(git -C submodule rev-parse --git-path refs/notes/textconv/d2x_cached)" > +' > + > test_done Thanks! I think it would be very good to have these tests in-tree along with an updated comment pointing to them.