Re: Bare repositories in the working tree are a security risk

Glen Choo <chooglen@xxxxxxxxxx> · Mon, 02 May 2022 12:39:01 -0700

Taylor Blau <me@xxxxxxxxxxxx> writes:

>> >>   1. Prevent checking out an embedded bare repo.
>> >>   2. Detect if the bare repo is embedded and refuse to work with it.
>> >>   3. Detect if the bare repo is embedded and do not read its config/hooks, but
>> >>      everything else still 'works'.
>> >>   4. Don't detect bare repos.
>> >>   5. Only detect bare repos that are named `.git` [1].
>> >>
>> >>   (I've responded with my thoughts on each of these approaches in-thread).
>> >
>> >   1. Likely disrupts too many legitimate workflows for us to adopt
>> >      without designing some way to declare an embedded bare repository
>> >      is "safe".
>> >   2. Ditto.
>> >   3. This seems the most promising approach so far. Similar to (1), I
>> >      would also want to make sure we provide an easy way to declare a
>> >      bare repository as "safe" in order to avoid permanently disrupting
>> >      valid workflows that have accumulated over the past >15 years.
>> >   4. Seems like this approach is too heavy-handed.
>> >   5. Ditto.
>>
>> If I understand you correctly, it seems like we can ship any of the options from
>> 1.-3., provided there is an easy way to opt-in known, "safe" bare repos.
>
> After thinking about it some more, I think that we should probably try
> to ship (3) of the ones that we agree are viable, but more on that
> below...
>
>> I really like the `safe.embeddedRepo` idea, though I'm not convinced about
>> "respect only the safe parts of the embedded repo". I'll address the latter
>> first.
>
> To be clear, I am advocating for "only the safe parts" insofar as "read
> repository extensions, core.repositoryFormatVersion and literally
> nothing else". I'm definitely not suggesting we go and enumerate every
> configurable value, determine whether it's safe or not, and then read
> only the safe ones. That approach seems doomed to fail, since no matter
> how clever we are, there will always be some slightly-cleverer attacker
> who can find a vector that we missed.

[...]

>> - This complicates the rules significantly for the user, who now has to figure
>>   out which parts of the bare repo are respected and which are not.
>
> On this point I disagree, but I suspect we weren't on the same page
> about what "only the safe parts" meant when you wrote this. To be
> extra-extra clear, I don't think we should read some parts of config and
> not other, I mean we should read _only_ the above listed parts (the
> format version and extensions) and nothing else.

Ah, to clarify, I'm taking an even more pessimistic stance here, which
is that I'd prefer to avoid trusting any parts of the repo (not the
config) as "safe", e.g. I'm not even keen on something as
innocuous-looking as "We trust _only_ HEAD, refs/ and objects/".

Maybe this is overly-pessimistic though; your response (further down)
suggests that this might not be as bad as I think.

>> - I'm also of the opinion that changing the rules like this actually does affect
>>   workflows, even if it doesn't break libgit2's tests.
>>   - A diligent user still has to convince themselves that the tests are passing
>>     for the right reasons, possibly adapting to the new rules (e.g. by
>>     selectively enabling `safe.embeddedRepo` on the right test fixtures).
>>   - A less diligent user might not even realize the change has happened and
>>     end up with difficult to debug results somewhere down the line.
>
> I am sympathetic to what you're saying, but I (a) think there's still a
> tradeoff here that doesn't obviously point us in one direction or the
> other and (b) we should equally keep in mind other workflows besides
> just test fixtures. Does that change our thinking at all? I'm not sure.

Just musing here. I think (b) is particularly important to keep in mind.
If a workflow is well-served by something else, I really don't mind
breaking it at all.

I suppose the test fixture use case comes up the most frequently because
it is one that we are familiar with, and doesn't seem to have a good
alternative.

Maybe we're approaching the limits of what we can know without
performing some studies of users in the wild.

>> I'm also not keen on it for other reasons:
>>
>> - This expands the attack surface significantly, and I'm pessimistic that we
>>   can maintain a list of the 'safe' parts of a bare repo. A lot of attention has
>>   been focused on config/hooks, but I wouldn't be surprised if a creative
>>   attacker finds some other avenue that we missed (maybe a buffer overflow
>>   attack on a malicious index file?).
>
> I disagree, though again I suspect we were thinking of different thing
> when saying "only read safe parts of the config". Still though, I would
> argue that it limits the attack surface at the right level, which is to
> say any vector that we _did_ miss is something that we should just fix
> (e.g., preventing a buffer overflow) and not "oops, this config value
> does specify an executable".
>
> (We shouldn't have to deal with the index file, though, since a bare
> repository would not read the index, no?).

I think this comment highlights the difference in opinion (in a good
way!).

IMO 'only reading parts of the repo and not others' will fix this
particular arbitrary code execution problem, but it won't fix (and might
even open us up to) other, future problems with embedded bare repos.

>From this comment, I get the impression (and I hope I'm representing you
correctly) that you think these future problems are manageable, and
don't justify breaking more workflows than we need to in order to fix
this arbitrary code execution problem.

I think it's a healthy disagreement though - there's enough common
ground for us to start working our approaches separately (plus I don't
think either of us will be able to convince the other by thinking up
hypotheticals on a mailing list thread :p), and the discussion will
probably get even better once we start comparing, reviewing, and
internally testing the patches.

(Ah, I'm not sure about the index file in particular, I was just
speculating.)

>> - I expect that this is also going to be really complex to implement and
>>   maintain; instead of looking in a single gitdir for everything, we now look in
>>   two gitdirs.
>
> I'd think that any approach we take that has different behavior
> for bare repositories depending on whether or not they are embedded has
> to do a similar check, so I don't think this adds significant
> complexity. Though not having written any code here yet, I'd take what I
> say with a huge grain of salt ;-).

;)

>> - There might be some unifying principles behind "allowlisting certain embedded
>>   bare repos" and "disabling/enabling bare repo detection" that can guide our
>>   fix.
>>   - Perhaps we could allow different 'levels' of bare repo protection, like
>>     'allow all bare repos', 'allow only non-embedded bare repos', 'allow no bare
>>     repos', 'allow embedded bare repos but not their configs".
>>
>>   - If we do want to discourage embedded bare repos (and flip the default), this
>>     kind of gradual roll-out might give projects a way to incrementally migrate.
>
> Ah! Are you suggesting a global configuration setting that controls the
> behavior of embedded bare repositories that _aren't_ listed in a
> repositories safe.embeddedRepo list?

Yes, and an even more restrictive mode that disables bare repo detection
altogether. I suspect this is a safe, non-disruptive default for many
user, but it obviously can't be the default today (or possibly ever?
we'd need a lot of user testing for sure).