Re: [PATCH 2/2] config: fix evaluating "onbranch" with nonexistent git dir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 24, 2024 at 12:05:46PM +0200, Patrick Steinhardt wrote:
> The `include_by_branch()` function is responsible for evaluating whether
> or not a specific include should be pulled in based on the currently
> checked out branch. Naturally, his condition can only be evaluated when
> we have a properly initialized repository with a ref store in the first
> place. This is why the function guards against the case when either
> `data->repo` or `data->repo->gitdir` are `NULL` pointers.
> 
> But the second check is insufficient: the `gitdir` may be set even
> though the repository has not been initialized. Quoting "setup.c":
> 
>   NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some
>   code paths so we also need to explicitly setup the environment if the
>   user has set GIT_DIR.  It may be beneficial to disallow bogus GIT_DIR
>   values at some point in the future.
> 
> So when either the GIT_DIR environment variable or the `--git-dir`
> global option are set by the user then `the_repository` may end up with
> an initialized `gitdir` variable. And this happens even when the dir is
> invalid, like for example when it doesn't exist. It follows that only
> checking for whether or not `gitdir` is `NULL` is not sufficient for us
> to determine whether the repository has been properly initialized.
> 

When I dive into this bug report, I feel so wired about this behavior. I
don't mind whether the code sets "gitdir" field. This is not important.
In "setup.c::setup_git_directory_gently", it will set the
"the_repository->gitdir" by checking the environment variable
"GIT_DIR_ENVIRONMENT". And by using `--git-dir` option, the code will
set this environment variable, so these two ways will set the "gitdir"
field in the global variable "the_repository".

We actually check the validation of "--gir-dir" option before we set
this value. Let me give you an example here:

  $ git --git-dir=notexist -c includeIf.onbranch:main.path=any fsck
  fatal: not a git repository: 'notexist'

I am curious here. And I notice the following difference in
"setup.c::setup_explicit_git_dir" function:

    if (!is_git_directory(gitdirenv)) {
        if (nongit_ok) {
            *nongit_ok = 1;
            ...
            return NULL;
        }
        die(_("not a git repository: '%s'"), gitdirenv);
    }

Apparently, the above example will execute the "die". For
"git-archive(1)", it will simply return NULL. This is because we allow
some commands to run outside of the git repo. And we distinguish them by
using the ".option" filed:

    { "apply", cmd_apply, RUN_SETUP_GENTLY },
    { "fsck", cmd_fsck, RUN_SETUP },

As we can see, the code will check whether the "gitdir" (although it is
not set into "the_repository" structure yet) is a valid git repository.
We already have this information.

In f7d61c4135 (config: don't depend on `the_repository` with branch
conditions, 2024-08-13). "config.c::include_by_branch" drops the global
variable "the_repository". This solves the problem reported by Ronan.
Because it happens in the set up process, the "data->repo" will be NULL.

    config_with_options(cb, data, NULL, NULL, &opts);

    int config_with_options(config_fn_t fn, void *data,
                            const struct git_config_source *config_source,
                            struct repository *repo,
                            const struct config_options *opts)
    {
        struct config_include_data inc = CONFIG_INCLUDE_INIT;

        ...

        inc.repo = repo;
    }

But the problem still exists for "git-archive(1)", in "cmd_archive"
function, we will initialize the configurations by using "repo_config".
But we are not inside the repo.

I wonder how we use the global variable "the_repository". I think the
main problem here is that we use "the_repository" structure outside of
the repo where we have already broken the semantics of the
"the_repository" variable.

> This issue can lead to us triggering a BUG: when using a config with an
> "includeIf.onbranch:" condition outside of a repository while using the
> `--git-dir` option pointing to an invalid Git directory we may end up
> trying to evaluate the condition even though the ref storage format has
> not been set up.
> 
> This bisects to 173761e21b (setup: start tracking ref storage format,
> 2023-12-29), but that commit really only starts to surface the issue
> that has already existed beforehand. The code to check for `gitdir` was
> introduced via 85fe0e800c (config: work around bug with
> includeif:onbranch and early config, 2019-07-31), which tried to fix
> similar issues when we didn't yet have a repository set up. But the fix
> was incomplete as it missed the described scenario.
> 

Yes, exactly. Because before 173761e21b, the code will always find the
files backend, it does care about whether we are in the git repo or not.

    unsigned int format = REF_STORAGE_FORMAT_FILES;
    const struct ref_storage_be *be = find_ref_storage_backend(format);

So, it won't complain.

> As the quoted comment mentions, we'd ideally refactor the code to not
> set up `gitdir` with an invalid value in the first place, but that may
> be a bigger undertaking. Instead, refactor the code to use the ref
> storage format as an indicator of whether or not the ref store has been
> set up to fix the bug.
> 

Should we? From my above comments, it does no matter whether we set the
"gitdir" in the "the_repository". Because we already check this and we
have this information. I think the main problem is that we use
"the_repository" badly.

We could run git commands inside the repo or outside the repo. If we run
git commands outside the repo, should we use the "the_repository"
variable? I guess we should not.

Because "git_config", "repo_config" and so on use the global variable
"the_repository", so we will encounter the trouble. But if we could use
something like "data->repo". We will make everything OK:

1. When running commands inside the git repo, we set "data->repo" to be
"the_repository".
2. When ruing commands outside the git repo, we set "data->repo" to be
NULL.

> Reported-by: Ronan Pigott <ronan@xxxxxx>
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  config.c                  | 15 +++++++++------
>  t/t1305-config-include.sh |  2 +-
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/config.c b/config.c
> index 1266eab0860..a11bb85da30 100644
> --- a/config.c
> +++ b/config.c
> @@ -306,13 +306,16 @@ static int include_by_branch(struct config_include_data *data,
>  	int flags;
>  	int ret;
>  	struct strbuf pattern = STRBUF_INIT;
> -	const char *refname = (!data->repo || !data->repo->gitdir) ?
> -		NULL : refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
> -					       "HEAD", 0, NULL, &flags);
> -	const char *shortname;
> +	const char *refname, *shortname;
>  
> -	if (!refname || !(flags & REF_ISSYMREF)	||
> -			!skip_prefix(refname, "refs/heads/", &shortname))
> +	if (!data->repo || data->repo->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
> +		return 0;
> +
> +	refname = refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
> +					  "HEAD", 0, NULL, &flags);
> +	if (!refname ||
> +	    !(flags & REF_ISSYMREF) ||
> +	    !skip_prefix(refname, "refs/heads/", &shortname))
>  		return 0;
>  
>  	strbuf_add(&pattern, cond, cond_len);
> diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
> index ad08db72308..517d6c86937 100755
> --- a/t/t1305-config-include.sh
> +++ b/t/t1305-config-include.sh
> @@ -389,7 +389,7 @@ test_expect_success 'onbranch without repository' '
>  	test_must_fail nongit git config get foo.bar
>  '
>  
> -test_expect_failure 'onbranch without repository but explicit nonexistent Git directory' '
> +test_expect_success 'onbranch without repository but explicit nonexistent Git directory' '
>  	test_when_finished "rm -f .gitconfig config.inc" &&
>  	git config set -f .gitconfig "includeIf.onbranch:**.path" config.inc &&
>  	git config set -f config.inc foo.bar baz &&
> -- 
> 2.46.0.551.gc5ee8f2d1c.dirty




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux