Re: Bare repositories in the working tree are a security risk

Taylor Blau <me@xxxxxxxxxxxx> · Fri, 15 Apr 2022 21:28:07 -0400

On Fri, Apr 15, 2022 at 05:41:59PM -0700, Glen Choo wrote:
> * We all agree that something needs to be done about embedded bare repos. This
>   is a pretty good starting point IMO, because we agree that 'do nothing' isn't
>   a good response.

To be totally honest, I am not absolutely convinced.

I agree that it's sub-optimal that Git is an attack vector for remote
code execution, but I think there is significant social engineering
required in order to meaningfully exploit this. Particularly because an
attacker must convince their victim to:

  - clone the repository,
  - cd into the embedded bare repository, and
  - run a git command

Scripting around the output of git commands in your PS1 makes the latter
more likely, so I think it's worthwhile to explore how to either prevent
this type of attack or make it substantially less likely to have a user
run git commands that execute parts of the config opportunistically.

That said, I think there are other approaches that we could take that
would hopefully disrupt fewer existing workflows.

> * There are use cases for embedded bare repos that don't have great alternatives
>   (e.g. libgit2 uses bare repos in its tests). Even if this workflow is frowned
>   upon (I personally don't think we should support it), I don't think we're
>   ready to categorically declare that Git should ban embedded bare repos
>   altogether (e.g. the way we ban .GiT).

I think there are a handful of legitimate reasons that we might want to
continue supporting this; for projects like libgit2 and Git LFS, it's
useful to have repositories in a known state to execute tests in. Having
bare Git repositories contained in some "test fixture" directory is a
really easy way to do just that.

> * We want additional protection on the client besides `git fsck`; there are
>   a few ways to do this:

I'm a little late to chime into the thread, but I appreciate you
summarizing some of the approaches discussed so far. Let me add my
thoughts on each of these in order:

>   1. Prevent checking out an embedded bare repo.
>   2. Detect if the bare repo is embedded and refuse to work with it.
>   3. Detect if the bare repo is embedded and do not read its config/hooks, but
>      everything else still 'works'.
>   4. Don't detect bare repos.
>   5. Only detect bare repos that are named `.git` [1].
>
>   (I've responded with my thoughts on each of these approaches in-thread).

  1. Likely disrupts too many legitimate workflows for us to adopt
     without designing some way to declare an embedded bare repository
     is "safe".
  2. Ditto.
  3. This seems the most promising approach so far. Similar to (1), I
     would also want to make sure we provide an easy way to declare a
     bare repository as "safe" in order to avoid permanently disrupting
     valid workflows that have accumulated over the past >15 years.
  4. Seems like this approach is too heavy-handed.
  5. Ditto.

> With that in mind, here's what I propose we do next:
>
> * Merge the `git fsck` patch [2] if we think that it is useful despite the
>   potentially huge number of false positives. That patch needs some fixing; I'll
>   make the changes when I'm back.

If there are lots of false positives, we should consider downgrading the
severity of the proposed `EMBEDDED_BARE_REPO` fsck check to "info". I'm
not clear if there is another reason why this patch would have a
significant number of false positives (i.e., if the detection mechanism
is over-zealous).

But if not, and this does detect only legitimate embedded bare
repositories, we should use it as a reminder to consider how many
use-cases and workflows will be affected by whatever approach we take
here, if any.

> * I'll experiment with (5), and if it seems promising, I'll propose this as an
>   opt-in feature, with the intent of making it opt-out in the future. We'll
>   opt-into this at Google to help figure out if this is a good default or not.
>
> * If that direction turns out not to be promising, I'll pursue (1), since that
>   is the only option that can be configured per-repo, which should hopefully
>   minimize the fallout.

Here's an alternative approach, which I haven't seen discussed thus far:

When a bare repository is embedded in another repository, avoid reading
its config by default. This means that most Git commands will still
work, but without the possibility of running any "executable" portions
of the config. To opt-out (i.e., to allow legitimate use-cases to start
reading embedded bare repository config again), the embedding repository
would have to set a multi-valued `safe.embeddedRepo` configuration. This
would specify a list of paths relative to the embedding repository's
root of known-safe bare repositories.

I think there are a couple of desirable attributes of this approach:

  - It minimally disrupts bare repositories, restricting the change to
    only embedded repositories.
  - It allows most Git commands to continue working as expected (modulo
    reading the config), hopefully making the population of users whose
    workflows will suddenly break pretty small.
  - It requires the user to explicitly opt-in to the unsafe behavior,
    because an attacker could not influence the embedding repository's
    `safe.embeddedRepo` config.

If we were going to do something about this, I would strongly advocate
for something that resembles the above. Or at the very least, some
solution that captures the attributes I outlined there.

I would be happy to work together with you (or anybody!) on developing
patches in that direction, so let me know if you are interested in
coordinating our efforts.

> Given that this embedded bare repo problem has been known for a long time, I
> don't think we need to rush out a fix, but (especially since I'll be OOO) I'm
> more than happy for someone to take my ideas (or any ideas) and run with them.

No rush. Regardless of your time out-of-office, we should take advantage
of the fact that this is a long-known wart to carefully craft a solution
that provides some additional safety while disrupting as few existing
workflows as possible.

Thanks,
Taylor