Re: [RFC PATCH 0/4] cache parent project's gitdir in submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 11, 2021 at 3:54 PM Emily Shaffer <emilyshaffer@xxxxxxxxxx> wrote:
>
> It's necessary for a superproject to know which submodules it contains.
> However, historically submodules do not know they are anything but a
> normal single-repo Git project (or a superproject, in nested-submodules
> cases). This decision does help prevent us from having to support
> divergent behaviors in submodule projects vs. superprojects, which makes
> sure Git is (somewhat) less confusing for the reader, and helps simplify
> our code.
>
> One could imagine, though, some conveniences we could gain from
> submodules learning added behavior (though not necessarily *different*
> behavior) to provide more context about the state of the project as a
> whole, and to make large submodule-based projects easier to work with.
> One example is a series[1] I sent some time ago, adding a config to be
> shared between the superproject and all its submodules. The RFC[2] I
> sent around the same time mentions a few other examples, such as "git
> status" run in the submodule noticing whether the superproject has a
> commit referencing the submodule's current HEAD.
>
> It's expensive and non-definitive to try and guess whether or not the
> current repo is a submodule. submodule.c:get_superproject_working_tree()
> does so by essentially running 'git -C .. ls-files -- <own-path>',
> invoking an additional process. get_superproject_working_tree() is not
> called often, so that's mostly fine. However, [1] attempted to include
> an additional config located in the superproject's gitdir by running
> 'git -C .. rev-parse --git-dir' during startup - a little expensive in
> the best case, because it's an extra process, but extremely expensive in
> the case when the current repo is *not* a submodule, because we hunt all
> the way up the filesystem looking for a '.git'. Adding that cost to
> every startup is infeasible.
>

It also adds cost for no benefit to the normal case where it's not a
submodule. The cost added for non-submodules ought to be as near-zero
as possible.

> To that end, in this series I propose caching a path to the
> superproject's gitdir - by having the superproject write that relative
> path to the submodule's config on creation or update. The goal here is
> *not* to say "If I am a submodule, I must have
> submodule.superprojectGitDir set" - but instead to say "If I have
> submodule.superprojectGitDir set, then I must be a submodule." That is,
> I expect we will find edge cases where a submodule was introduced in
> some interesting way that bypassed any of the patches below, and
> therefore doesn't have the superproject's gitdir cached.
>
> The combination of these two rules:
>  - Anything relying on submodule.superprojectGitDir must be nice to
>    have, but not essential, because
>  - It's possible for a submodule to be valid without having
>    submodule.superprojectGitDir set
> makes me feel more comfortable with the idea of submodules learning
> additional behavior based on this config. I feel pretty unconfident in
> our ability to ensure that *every* submodule has this config set.
>

I think this is a good direction. I do think having some information
about being a submodule could be very useful for tools such as git
status, and making it more of a "if we know for sure, we get some
small benefit, but if we don't know then it's no harm" is a good
direction.

> The series covers a few paths for introducing that config, which I'm
> hoping covers most cases.
>  - "git submodule update" (which seems to be part of the "git submodule
>    init" flow)
>  - "git submodule absorbgitdir" to convert a "git init"'d repo into a
>    submodule
>
> Notably, we can only really set this config when 'the_repository' is the
> superproject - that appears to be the only time when we know the gitdirs
> of both the superproject and the submodule.

We could also presumably add a new command for setting this?

>
> I'm expecting folks may have a lot to say about this, so I look forward
> to discussion :)
>
>  - Emily
>
> 1: https://lore.kernel.org/git/20210423001539.4059524-1-emilyshaffer@xxxxxxxxxx
> 2: https://lore.kernel.org/git/YHofmWcIAidkvJiD@xxxxxxxxxx
>
> Emily Shaffer (4):
>   t7400-submodule-basic: modernize inspect() helper
>   introduce submodule.superprojectGitDir cache
>   submodule: cache superproject gitdir during absorbgitdirs
>   submodule: cache superproject gitdir during 'update'
>
>  builtin/submodule--helper.c        |  4 +++
>  git-submodule.sh                   |  9 ++++++
>  submodule.c                        | 10 ++++++
>  t/t7400-submodule-basic.sh         | 49 ++++++++++++++----------------
>  t/t7406-submodule-update.sh        | 10 ++++++
>  t/t7412-submodule-absorbgitdirs.sh |  1 +
>  6 files changed, 57 insertions(+), 26 deletions(-)
>
> --
> 2.32.0.272.g935e593368-goog
>



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux