Re: [PATCH 04/31] setup: don't perform lazy initialization of repository state

Brandon Williams <bmwill@xxxxxxxxxx> · Mon, 5 Jun 2017 11:44:12 -0700

On 06/05, Jeff King wrote:
> On Mon, Jun 05, 2017 at 10:43:52AM -0700, Brandon Williams wrote:
> 
> > > In the case of setup_git_env(), I think it was less about doing work and
> > > more that we didn't want to have to do explicit setup in each program.
> > > But over the years we've moved away from that, and in fact if you hit
> > > the lazy initialization these days you'll generally BUG() anyway.
> > > 
> > > _But_ I suspect there are still some cases you'd need to handle. For
> > > instance, it's still OK to skip calling setup_git_directory() if you've
> > > got $GIT_DIR in the environment (which is why we have have_git_dir()
> > > instead of checking startup_info->have_repository).
> > 
> > Yes there are a couple places that rely on the lazy initialization but
> > that's not due to setup not being run.  Rather it has to do with GIT_DIR
> > being set to a bogus directory so when setup is run gently it does
> > nothing.  Then at a later point in time the command tries to access
> > files in the gitdir (which triggers lazy init of the git environment).
> > 
> > So I think that explicitly doing the 'lazy init' portion (which ensures
> > that the env gets setup even if GIT_DIR is bogus) at the end of setup
> > should be sufficient, least it seems to be so though perhaps we can't
> > rely on our tests to tell us that.
> 
> In have_git_dir() we do:
> 
>   int have_git_dir(void)
>   {
> 	  return startup_info->have_repository
> 		  || git_dir
> 		  || getenv(GIT_DIR_ENVIRONMENT);
>   }
> 
> and generally call that right before calling a function that might do
> setup_git_env(). We can assume that if have_repository is set that we
> have an actual git_dir (we can check all of the assignment spots and
> verify that they always set it in tandem with the global git_dir).
> And obviously if git_dir is set, we're good.
> 
> But if that third condition ever triggers, it's because we're relying on
> the lazy setup to copy it into git_dir. And if I understand correctly,
> that would turn into a BUG with your patch.
> 
> I guess my question is: does that third condition actually trigger in
> practice? I added those conditions in b9605bc4f (config: only read
> .git/config from configured repos, 2016-09-12). I remember there being
> some reason for needing those back then, but the commit message doesn't
> say. But if I remove them and just check have_repository (either on top
> of that commit or on top of the current master) the tests all still
> pass.

The only instance I saw the third condition trigger was when GIT_DIR was
set to a bogus gitdir so when reading the early config a bogus
gitdir/config file was attempted to be read resulting in a noop.  If
GIT_DIR is set to a valid gitdir then the other condition in
read_early_config (which does repository discovery itself) would
correctly catch and read from the repository's config.

> 
> So I'm not sure at this point what the case was that motivated it, or if
> it really was just an abundance of caution. But if there is such a case,
> I suspect it's broken by your patch series.
> 
> I'm not sure we have a good way to find out, though. Like the current
> BUG() in setup_git_env, I think we'll just have to cook the patches for
> a long time, see if anybody triggers it, and deal with it on a case by
> case basis.
> 
> I do kind of wonder if we should simplify have_git_dir(), if those other
> conditions aren't actually triggering.

If this ends up being that big of an issue I'm sure that it would be
easy to adapt this and revert to doing lazy-init.  I just saw how
clunky it is and was trying to work to start the process of cleaning it
up so that the code becomes easier to reason about.  It seemed to make
things easier for me to package them in a struct at least (and reason
about how the config system would work once stored in a repository
struct).

-- 
Brandon Williams