Re: [PATCH v6 0/5] config: introduce discovery.bare and protected config

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 08 Jul 2022 23:58:22 +0200

On Fri, Jul 01 2022, Glen Choo wrote:

Sorry for the late reply, I see there's a v7, but this seems to also
apply to it, so...

> Thanks for weighing in :) Despite the different proposed approaches, I
> think we actually are in broad agreement.
>
> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>> On Thu, Jun 30 2022, Glen Choo via GitGitGadget wrote:
>>
>>> This is a quick re-roll to address Ævar's comments on the tests (thanks!).
>>
>> Thanks!
>>
>>> = Description
>>
>> Just more generally on this series & approach. I know this is a v6 by
>> now, but I haven't kept up with this topic, but to be fair I did mention
>> pretty much this in:
>> https://lore.kernel.org/git/220407.86lewhc6bz.gmgdl@xxxxxxxxxxxxxxxxxxx/
>>
>> So...
>>
>>> There is a known social engineering attack that takes advantage of the fact
>>> that a working tree can include an entire bare repository, including a
>>> config file. A user could run a Git command inside the bare repository
>>> thinking that the config file of the 'outer' repository would be used, but
>>> in reality, the bare repository's config file (which is attacker-controlled)
>>> is used, which may result in arbitrary code execution. See [1] for a fuller
>>> description and deeper discussion.
>>>
>>> This series implements a simple way of preventing such attacks: create a
>>> config option, discovery.bare, that tells Git whether or not to die when it
>>> finds a bare repository. discovery.bare has two values:
>>>
>>>  * "always": always allow bare repositories (default), identical to current
>>>    behavior
>>>  * "never": never allow bare repositories
>>>
>>> and users/system administrators who never expect to work with bare
>>> repositories can secure their environments using "never". discovery.bare has
>>> no effect if --git-dir or GIT_DIR is passed because we are confident that
>>> the user is not confused about which repository is being used.
>>
>> I'm not insisting that the entire approach here should be changed, but
>> in the above exchange you seemed to have performance concerns about the
>> "just walk up in setup.c" approach I mentioned, but it's not clear if
>> that's still the only thing that necessitates taking this approach.
>>
>> There may be security subtleties that I've missed, but from the
>> description here it seems like that would work equally well, and
>> wouldn't require configuration, except insofar as we'd need to opt-in to
>> reading config from bare repositores *that also exist in a parent tree*.
>>
>> And it would be a more narrow & more secure solution, since it would
>> e.g. allow you to intentionally navigate to /var/repos/git/git.git in a
>> server setup and read the config there, which it could distinguish from
>> a case of /var/repos/.git existing, and git/git.git being brought in as
>> a part of that "parent" repo.
>
> Performance is one major concern, yes, and I agree that your findings
> show that the "just walk up" approach is cheap enough to consider doing.
> Though in the few cases where it isn't cheap to walk, wouldn't it still
> be useful to be able to opt out of it?

Maybe, but until we at least have a reason to think that there is a
performance concern this all seems rather hypothetical.

In the thread(s) I linked to I noted that I tested my POC implementation
on AIX, which has a sloooooooow filesystem.

Why are you concerned about this having a performance impact?

> The other concern is simplicity and correctness. Are we confident that
> we'll get the design of "just walk up" correct (including edge cases
> like "bare repo in bare repo in non bare repo")? I'm 100% confident that
> we'll get it right eventually, and that this approach will be a good
> default for all users. But in comparison, "never" is so much easier to
> understand and implement that I don't see why we shouldn't start by
> presenting this option to the 0.1-1% of users who would find it useful.

If you run "git status" or whatever in a directory anywhere on your FS
now it'll confidently tell you if you're not within a git repository.

We're just talking about re-using that code, why would we be concerned
that:

 1. Finding a repo at a given <PATH>
 2. Appending "/.." to that <PATH>
 3. Feeding it to a "given a path, find me the git repo" (code setup.c
    already has)

Is something we'd get wrong?

> And on the topic of simplicity, there's significant interest in
> maintaining backwards-compatibility with repos with workflows that
> absolutely depend on embedded bare repos, e.g. libgit2 and Git-LFS.
> That's yet another special case that we'd have to get right. Stolee's
> "no-embedded" proposal [1] pretty much covers that, but I don't see the
> harm in simplifying the design space by making bare repo support a
> non-goal.
>
> [1] https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@xxxxxxxxxx

Sure, and we could have a config know to still use such repos (for
config or otherwise) with the "walk up" method, does that address your
concern here?

>> The "more narrow" and "more secure" go hand-in-hand, since if you work
>> on such servers you'd turn this to "always" because you want to read
>> such config, but then be left vulnerable to the actual (and muche rarer)
>> exploit we're trying to prevent.
>
> The point that we're not defending bare repo users is fair, but maybe
> the group we're trying to protect isn't really dedicated Git-serving
> servers. This exploit requires you to have a bare repo inside the
> working tree of a non-bare repo. So I think this is less of an issue for
> a server, and more for "mixed-use" environments with both regular and
> bare clones.

Yes, but this is only something that's even a question because of an
artificial limitation your proposal here suffers from.

I.e. in trying to detect nefarious repos where you've got "looks like
bare" content *tracked* in another repo you're conflating it with *any
bare repo*.

And the only reason we're doing that seems to me to be a premature
optimization.

>> Which, it seems...
>>
>>> This series does not change the default behavior, but in the long-run, a
>>> "no-embedded" option might be a safe and usable default [2]. "never" is too
>>> restrictive and unlikely to be the default.
>>
>> This series has (since v3?) been noting aspirations to have a
>> "no-embedded" variant of this config, which your 5/5 here notes would be
>> better, but isn't implemented by this series.
>>
>> But your 5/5 also notes:
>>
>>     but detecting if a repository is embedded is potentially
>>     non-trivial, so this work is not implemented in this series.
>>
>> Hrm, well, the diff-stat isn't quite that trivial either :) :
>
> Well.. a lot of it is refactoring :P
>
>>> [...]
>>>  upload-pack.c                       | 27 ++++++----
>>>  12 files changed, 304 insertions(+), 47 deletions(-)
>>
>> In threads linked from the above ML link I linked to some POC code
>> showing how to hack a second .git discovery walk into setup.c. This was
>> as part of the "submodule parent dir" proposal, which is a different
>> feature, but also needs such "find the parent" code:
>> https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@xxxxxxxxxxxxxxxxxxx/
>>
>> Now, obviously that's a dirty hack, but it's not that hard to just
>> change the part of setup.c where we're satisfied that we've found the
>> git dir, then walk up "$THAT_DIR/..", and start our search again.
>>
>> Then:
>>
>> 	if (first_dir_was_bare() && found_parent_dir())
>>         	enforce_no_embedded();
>>
>> Isn't that what your proposed "no embedded" option would need to do?
>> Well, maybe we'd also check if the "first dir" is in the index of the
>> parent, as opposed to just being a bare .git somewhere in ~/Downloads,
>> e.g. if you have a ~/.git and keep your dot-files in git.
>>
>> But I think for an initial implementation just doing the walk would be
>> good enough, and would have a more narrow scope than this configuration
>> setting.
>
> A narrow scope is good, but I don't agree on this definition of
> "narrow". My preference is to give an obvious solution to a 'narrow'
> group of users, instead of a more tricky solution that affects all users
> in a 'narrow' set of cases.

We could still have an option to say "I never want to consider bare
repos from config", but if we're able to distinguish *tracked* bare
repos from non-tracked ones this is entirely unrelated to the initial
stated motivation for this series, isn't it?

I.e. to solve the security issue at hand.

I've got nothing against *also* providing a "I never want config from
xyz repos", but that seems to be orthagonal.

>> AFAICT the performance concerns aren't supported by any data, in the
>> case of the "submodule superproject" feature it turned out to not be the
>> directory walk, but us shelling out in a loop in git-submodule.sh.
>>
>> Well, *maybe* that's not the case, I think I have managed to read
>> between the lines of some of these past exchanges that there's some odd
>> propriterary internal NFS-like setup at Google where *parent dirs* are
>> auto-mounted and searched on access, so a "walk up" pattern would be
>> much more expensive.
>>
>> I do worry a bit about us ending up with design choices in git that we
>> wouldn't have ended up with, if not to cater to some in-house setup
>> somwhere that 99.99% of git users will never see.
>
> At the very least, I don't think you're saying that it's a bad idea to
> have "never", just that we might not have come up with it if not for
> some Google NFS thing.

I'm just speculating as to why we've ended up with this approach, but
maybe I'm wrong.

> Another use case I can think of is CI bots, which have no need for bare
> repos. To some folks (maybe in very security-sensitive environments),
> "never" might give more peace of mind than "no-embedded".

Sure, but again, I've got nothing against having *more config knobs*,
and I've often e.g. wanted some way to tell git to read no config at all
(which you can sort of do with the right environment variables, but it's
a hassle).

But I don't see how that's not unrelated to addressing the issue this
series aims to address, and what I'm pointing out that it's doing so
with a method that's less accurate than the "walk up" method, and less
secure.