On Wed, Nov 17, 2021 at 12:43:38PM +0100, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Nov 16 2021, Emily Shaffer wrote: > > > [...] > > A couple things. Firstly, a semantics change *back* to the semantics of > > v3 - we map from gitdir to gitdir, *not* from common dir to common dir, > > so that theoretically a submodule with multiple worktrees in multiple > > superproject worktrees will be able to figure out which worktree of the > > superproject it's in. (Realistically, that's not really possible right > > now, but I'd like to change that soon.) > > > > Secondly, a rewording of comments and commit messages to indicate that > > this isn't a cache of some expensive operation, but rather intended to > > be the source of truth for all submodules. I also added a fifth commit > > rewriting `git rev-parse --show-superproject-working-tree` to > > demonstrate what that means in practice - but from a practical > > standpoint, I'm a little worried about that fifth patch. More details in > > the patch 5 description. > > > > I did discuss Ævar's idea of relying on in-process filesystem digging to > > find the superproject's gitdir with the rest of the Google team, but in > > the end decided that there are some worries about filesystem digging in > > this way (namely, some ugly interactions with network drives that are > > actually already an issue for Googler Linux machines). Plus, the allure > > of being able to definitively know that we're a submodule is pretty > > strong. ;) But overall, this is the direction I'd prefer to keep going > > in, rather than trying to guess from the filesystem going forward. > > Did you try running the ad-hoc benchmark I included in [1] on that > Google NFS? I've dealt with some slow-ish network filesystems, but if > it's slower than AIX's local FS (where I couldn't see a difference) I'd > put money on it being a cross-Atlantic mount or something :) > > Re your: > > "this isn't a cache of some expensive operation, but rather intended to be the source of truth for all submodules." > > In your 5/5 it says, in seeming contradiction to this: > > This commit may be more of an RFC - to demonstrate what life looks like > if we use submodule.superprojectGitDir as the source of truth. But since > 'git rev-parse --show-superproject-working-tree' is used in a lot of > scripts in the wild[1], I'm not so sure it's a great example. > > To be honest, I'd prefer to die("Try running 'git submodule update'") > here, but I don't think that's very script-friendly. However, falling > back on the old implementation kind of undermines the idea of treating > submodule.superprojectGitDir as the point of truth. > > Most of what I've been suggesting in my [1] and related is that I'm > confused about if & how this is a pure caching mechanism. > > Removing mentions of it being a cache but it seemingly still being a > cache at the tip of this series has just added to that confusion for > me :) Yeah, I think this was a bad choice for me to include that patch. I was really hopeful that I could show off "look, we don't need to ever hunt in the FS above us", but for established repos, that's a bad idea (because lots of people are already using this 'git rev-parse --show-superproject-work-tree' thing in scripts, like I mentioned). So I think it was a mistake to include it at all. Rather, I think it's probably a better idea to treat that particular entry point as "legacy" and implement other things using 'submodule.superprojectGitDir' directly. Because the patch 5 illustrates: "I'm saying that this new config isn't a cache, but look, here's how I can treat it like a cache that might be invalid and here's how I can fall back on a potentially expensive operation anyways." I think I could have illustrated it a little better with something like "here's a brand new 'git rev-parse --show-superproject-gitdir'" which directly calls on the new config. So, sorry about that. > > Anyway. While I do think this caching mechanism is probably > unnecessary in the short to medium term, i.e. it seems to the extent > that it was ever needed was due to some bridging of *.sh<->*.c that > we're *this* close to eliminating anyway. > > But maybe I'm wrong. The benchmark I suggested above on that Google > NFS might be indicative. I don't really see how something that'll be > doing a bunch of FS ops anyway is going to be noticeably slower with > that approach, but maybe opening the index/tree of the superproject is > more expensive than I'm expecting. > > In any case, all of that's not the hill I'm picking to die on. If > you'd like to go ahead with this cache-or-not-a-cache then sure, I > won't belabor that point. Yeah, I think I would. I've heard some serious reservations from others on my team about trying to use filesystem traversal here at all, so I think that would be an uphill battle. > > I *do* strongly think if we're doing so though that we should have > something like this on top. I.e. let's test wha happens if we do and > don't have this "caching" variable, which is demonstrably easy to do. > > Benchmarking the two gives me: > > $ git hyperfine -L rev HEAD~0 -L s true,false -s 'make -j8 all' '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR={s} ./t7412-submodule-absorbgitdirs.sh)' > Benchmark 1: (cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=true ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0 > Time (mean ± σ): 545.9 ms ± 1.6 ms [User: 490.3 ms, System: 114.0 ms] > Range (min … max): 543.5 ms … 548.1 ms 10 runs > > Benchmark 2: (cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=false ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0 > Time (mean ± σ): 537.9 ms ± 11.4 ms [User: 476.8 ms, System: 117.6 ms] > Range (min … max): 532.7 ms … 570.1 ms 10 runs > > Summary > '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=false ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0' ran > 1.01 ± 0.02 times faster than '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=true ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0' > > I.e. not using the cache is either indistinguishable or a bit faster > (the "a bit faster" is definitely due to just running less test code > though). Yeah, once again, I think it is better to treat "git rev-parse --show-superproject-work-tree" as "legacy" and to rely solely on the config for new options, meaning that "what happens without this variable" is as simple as "we treat it like it's a standalone repository with no superproject", rather than a performance difference at all. - Emily > > I'm sending this before the CI run[2] finishes (which now tests both > modes), but both of these work for me locally on a full test suite > run. > > 1. https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@xxxxxxxxxxxxxxxxxxx/ > 2. https://github.com/avar/git/runs/4237446991?check_suite_focus=true > > Ævar Arnfjörð Bjarmason (2): > submodule tests: fix potentially broken "config .. --unset" > submodule: add test mode for checking absence of "superProjectGitDir" > > ci/run-build-and-tests.sh | 1 + > git-submodule.sh | 2 +- > submodule.c | 7 +++++++ > t/lib-submodule-superproject.sh | 24 ++++++++++++++++++++++++ > t/t7406-submodule-update.sh | 13 ++++++------- > t/t7412-submodule-absorbgitdirs.sh | 19 ++++++------------- > 6 files changed, 45 insertions(+), 21 deletions(-) > create mode 100644 t/lib-submodule-superproject.sh > > -- > 2.34.0.796.g2c87ed6146a >