Re: [PATCH v9 0/3] teach submodules to know they're submodules

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 11 Mar 2022 10:09:50 +0100

On Wed, Mar 09 2022, Emily Shaffer wrote:

> For the original cover letter, see
> https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer%40google.com.
>
> CI run: https://github.com/nasamuffin/git/actions/runs/1954710601
>
> Since v8:
>
> Only a couple of minor fixes.
>
> Junio pointed out that I could write the tests better using --type=bool
> and 'test_cmp_config', and that we could be a little more careful about
> when to give up on 'git rev-parse --show-superproject-working-dir'.
>
> Glen mentioned that builtin/submodule--helper.c:run_update_procedure() is called
> unconditionally earlier in the same function where I had added the
> config in git-submodule.sh. So, I moved the config set into
> submodule--helper.c to reduce possible edge cases where the config might
> not be set.
>
> Otherwise, this series is pretty much unchanged.
>
> Since v7:
>
> Actually a fairly large rework. Rather than keeping the path from gitdir
> to gitdir, just keep a boolean under 'submodule.hasSuperproject'. The
> idea is that from this boolean, we can decide whether to traverse the
> filesystem looking for a superproject.
>
> Because this simplifies the implementation, I compressed the three
> middle commits into one. As proof-of-concept, I added a patch at the end
> to check for this boolean when running `git rev-parse
> --show-superproject-working-tree`.
>
> One thing I'm not sure about: in the tests, I check whether the config
> is set, but not what the boolean value of it is. Is there a better way
> to do that? For example, I could imagine someone deciding to set
> `submodule.hasSuperproject = false` and the tests would not function
> correctly in that case. I think we don't really normalize the value on a
> boolean config like that, so I didn't want to write a lot of comparison
> to check if the value is 1 or true or True or TRUE or Yes or .... Am I
> overthinking it?
>
> The other thing I'm not sure about: since it's just a bool, we're not
> restricted to setting this config only when we have both gitdir paths
> available. That makes me want to set the config any time we are doing
> something with submodules anyway, like any time 'git-submodule--helper'
> is used. But that helper seems to be called in the context of the
> superproject, not of the submodules, so adding this config for each
> submodule we touch would be a second child process. Is there some other
> common entry point for submodules that we can use?

I really don't mean to bring up the same points again, but I'm still
genuinely unsure what this is intended to solve in the end.

I.e. from the original RFC we went from it being for optimizations for
the shellscript "git rev-parse", to suggestions that the configured path
would be "canonical" in a way we couldn't discover on-the-fly (i.e. some
of Jonathan's noted edge cases [1]).

But now it's a boolean indicating "it's there, discover it", and the
implied (but not really explicitly stated) reason in 2/3 is that it's
purely for optimization purposes at this point.

But it's an optimization without a benchmark.

In [1] Jonathan (if I understood it correctly, see [2]) might have
suggested this is important to deal with some Google in-house NFS-a-like
auto-mounting software, i.e. the "walking up" is truly expensive in some
scenarios.

I do worry a bit that we'll be creating behavior edge cases related to
this, and if the problem being solved is for a relatively obscure setup
is it worth it, and in that case perhaps there should be a "I need this
optimization" setting guarding it?

But I don't know, a concrete case where this series makes a difference
would really help.

I tried to come up with one before[3] and all I could find was fleeting
cases we'd see go away with the migration of the remaining parts of
git-submodule.sh to C, which we already have in-flight patches for (or
rather, Glen is AFAIK at series 1/2 of submitting those, with 1/2
in-flight).

In any case I think lifting the bits of [3] where we assert that this
doesn't introduce any behavior change with a GIT_TEST_* knob would be
valuable.

I.e. as long a the intent isn't a behavior change let's test that
get_superproject_working_tree() doesn't need this across the entire test
suite, with specific tests that opt-in to the behavior (or do a whole
test suite run in that mode), rather than the default being
opt-out.

An opt-out is just a recipe for growing accidental implicit
dependencies, which explicitly isn't what we want for a "just an
optimization" knob. We do the same sort of opt-in/out-out testing for
e.g. split index, untracked cache etc (see the GIT_TEST_* bits in
ci/run-build-and-tests.sh). AFAICT a fix-up of just adding the
git_env_bool() here to this code in your 3/3 would do it:

	if (!git_env_bool("GIT_TEST_NO_SUBMODULE_HAS_SUPERPROJECT", 0) &&
	    !git_config_get_bool("submodule.hassuperproject", &has_superproject_cfg)
	    && !has_superproject_cfg)

And then adding GIT_TEST_NO_SUBMODULE_HAS_SUPERPROJECT=true to
linux-TEST-vars in ci/run-build-and-tests.sh. The tests that do rely on
submodule.hassuperproject would need to set
GIT_TEST_NO_SUBMODULE_HAS_SUPERPROJECT=false of course...

1. https://lore.kernel.org/git/YgF5V2Y0Btr8B4cd@xxxxxxxxxx/
2. https://lore.kernel.org/git/220212.864k53yfws.gmgdl@xxxxxxxxxxxxxxxxxxx/
3. https://lore.kernel.org/git/RFC-cover-0.2-00000000000-20211117T113134Z-avarab@xxxxxxxxx/