Re: [RFC PATCH 0/2] submodule: test what happens if submodule.superprojectGitDir isn't around

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 17, 2021 at 12:43:38PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Nov 16 2021, Emily Shaffer wrote:
> 
> > [...]
> > A couple things. Firstly, a semantics change *back* to the semantics of
> > v3 - we map from gitdir to gitdir, *not* from common dir to common dir,
> > so that theoretically a submodule with multiple worktrees in multiple
> > superproject worktrees will be able to figure out which worktree of the
> > superproject it's in. (Realistically, that's not really possible right
> > now, but I'd like to change that soon.)
> >
> > Secondly, a rewording of comments and commit messages to indicate that
> > this isn't a cache of some expensive operation, but rather intended to
> > be the source of truth for all submodules. I also added a fifth commit
> > rewriting `git rev-parse --show-superproject-working-tree` to
> > demonstrate what that means in practice - but from a practical
> > standpoint, I'm a little worried about that fifth patch. More details in
> > the patch 5 description.
> >
> > I did discuss Ævar's idea of relying on in-process filesystem digging to
> > find the superproject's gitdir with the rest of the Google team, but in
> > the end decided that there are some worries about filesystem digging in
> > this way (namely, some ugly interactions with network drives that are
> > actually already an issue for Googler Linux machines). Plus, the allure
> > of being able to definitively know that we're a submodule is pretty
> > strong. ;) But overall, this is the direction I'd prefer to keep going                                                                                                                          
> > in, rather than trying to guess from the filesystem going forward.
> 
> Did you try running the ad-hoc benchmark I included in [1] on that
> Google NFS? I've dealt with some slow-ish network filesystems, but if
> it's slower than AIX's local FS (where I couldn't see a difference) I'd
> put money on it being a cross-Atlantic mount or something :)
> 
> Re your:
> 
>     "this isn't a cache of some expensive operation, but rather intended to                                                                                                                          be the source of truth for all submodules."
> 
> In your 5/5 it says, in seeming contradiction to this:
> 
>     This commit may be more of an RFC - to demonstrate what life looks like
>     if we use submodule.superprojectGitDir as the source of truth. But since
>     'git rev-parse --show-superproject-working-tree' is used in a lot of
>     scripts in the wild[1], I'm not so sure it's a great example.
> 
>     To be honest, I'd prefer to die("Try running 'git submodule update'")
>     here, but I don't think that's very script-friendly. However, falling
>     back on the old implementation kind of undermines the idea of treating
>     submodule.superprojectGitDir as the point of truth.
> 
> Most of what I've been suggesting in my [1] and related is that I'm
> confused about if & how this is a pure caching mechanism.
> 
> Removing mentions of it being a cache but it seemingly still being a
> cache at the tip of this series has just added to that confusion for
> me :)

Yeah, I think this was a bad choice for me to include that patch. I was
really hopeful that I could show off "look, we don't need to ever hunt
in the FS above us", but for established repos, that's a bad idea
(because lots of people are already using this 'git rev-parse
--show-superproject-work-tree' thing in scripts, like I mentioned). So I
think it was a mistake to include it at all. Rather, I think it's
probably a better idea to treat that particular entry point as "legacy"
and implement other things using 'submodule.superprojectGitDir'
directly.

Because the patch 5 illustrates: "I'm saying that this new config isn't
a cache, but look, here's how I can treat it like a cache that might be
invalid and here's how I can fall back on a potentially expensive
operation anyways." I think I could have illustrated it a little better
with something like "here's a brand new 'git rev-parse
--show-superproject-gitdir'" which directly calls on the new config.

So, sorry about that.

> 
> Anyway. While I do think this caching mechanism is probably
> unnecessary in the short to medium term, i.e. it seems to the extent
> that it was ever needed was due to some bridging of *.sh<->*.c that
> we're *this* close to eliminating anyway.
> 
> But maybe I'm wrong. The benchmark I suggested above on that Google
> NFS might be indicative. I don't really see how something that'll be
> doing a bunch of FS ops anyway is going to be noticeably slower with
> that approach, but maybe opening the index/tree of the superproject is
> more expensive than I'm expecting.
> 
> In any case, all of that's not the hill I'm picking to die on. If
> you'd like to go ahead with this cache-or-not-a-cache then sure, I
> won't belabor that point.

Yeah, I think I would. I've heard some serious reservations from others
on my team about trying to use filesystem traversal here at all, so I
think that would be an uphill battle.

> 
> I *do* strongly think if we're doing so though that we should have
> something like this on top. I.e. let's test wha happens if we do and
> don't have this "caching" variable, which is demonstrably easy to do.
> 
> Benchmarking the two gives me:
> 
>     $ git hyperfine -L rev HEAD~0 -L s true,false -s 'make -j8 all' '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR={s} ./t7412-submodule-absorbgitdirs.sh)'
>     Benchmark 1: (cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=true ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0
>       Time (mean ± σ):     545.9 ms ±   1.6 ms    [User: 490.3 ms, System: 114.0 ms]
>       Range (min … max):   543.5 ms … 548.1 ms    10 runs
>      
>     Benchmark 2: (cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=false ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0
>       Time (mean ± σ):     537.9 ms ±  11.4 ms    [User: 476.8 ms, System: 117.6 ms]
>       Range (min … max):   532.7 ms … 570.1 ms    10 runs
>      
>     Summary
>       '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=false ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0' ran
>         1.01 ± 0.02 times faster than '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=true ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0'
> 
> I.e. not using the cache is either indistinguishable or a bit faster
> (the "a bit faster" is definitely due to just running less test code
> though).

Yeah, once again, I think it is better to treat "git rev-parse
--show-superproject-work-tree" as "legacy" and to rely solely on the
config for new options, meaning that "what happens without this
variable" is as simple as "we treat it like it's a standalone repository
with no superproject", rather than a performance difference at all.

 - Emily

> 
> I'm sending this before the CI run[2] finishes (which now tests both
> modes), but both of these work for me locally on a full test suite
> run.
> 
> 1. https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@xxxxxxxxxxxxxxxxxxx/
> 2. https://github.com/avar/git/runs/4237446991?check_suite_focus=true
> 
> Ævar Arnfjörð Bjarmason (2):
>   submodule tests: fix potentially broken "config .. --unset"
>   submodule: add test mode for checking absence of "superProjectGitDir"
> 
>  ci/run-build-and-tests.sh          |  1 +
>  git-submodule.sh                   |  2 +-
>  submodule.c                        |  7 +++++++
>  t/lib-submodule-superproject.sh    | 24 ++++++++++++++++++++++++
>  t/t7406-submodule-update.sh        | 13 ++++++-------
>  t/t7412-submodule-absorbgitdirs.sh | 19 ++++++-------------
>  6 files changed, 45 insertions(+), 21 deletions(-)
>  create mode 100644 t/lib-submodule-superproject.sh
> 
> -- 
> 2.34.0.796.g2c87ed6146a
> 



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux