Re: [PATCH] fsck: detect bare repos in trees and warn

Glen Choo <chooglen@xxxxxxxxxx> · Wed, 13 Apr 2022 15:24:42 -0700

Derrick Stolee <derrickstolee@xxxxxxxxxx> writes:

> On 4/7/2022 8:42 AM, Johannes Schindelin wrote:
>> Hi Glen,
>> 
>> On Wed, 6 Apr 2022, Glen Choo wrote:
>> 
>>> Git tries not to distribute configs in-repo because they are a security
>>> risk. However, an attacker can do exactly this if they embed a bare
>>> repo inside of another repo.
>>>
>>> Teach fsck to detect whether a tree object contains a bare repo (as
>>> determined by setup.c) and warn. This will help hosting sites detect and
>>> prevent transmission of such malicious repos.
>>>
>>> See [1] for a more in-depth discussion, including future steps and
>>> alternatives.
>>>
>>> [1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>> 
>> Out of curiosity: does this new check trigger with
>> https://github.com/libgit2/libgit2? AFAIR it has embedded repositories
>> that are used in its test suite. In other words, libgit2 has a legitimate
>> use case for embedded bare repositories, I believe.
>
> It is definitely good to keep in mind that other repositories have
> included bare repositories for convenience. I'm not sure that the behavior
> of some good actors should outweigh the benefits of protecting against
> this attack vector.
>
> The trouble here is: how could the libgit2 repo change their project to
> not trigger this warning? These bare repos are in their history forever if
> they don't do go through significant work and pain to remove them from
> their history. We would want to have a way to make the warnings less
> severe for special cases like this.
>
> Simultaneously, we wouldn't want to bless all _forks_ of libgit2.

Yes, that makes sense. Thanks for the thoughtful reply.

>  2. Suppress warnings on trusted repos, scoped to a specific set of known
>     trees _or_ based on some set of known commits (in case the known trees
>     are too large).

Since Junio mentioned downthread that we'd need (2), I'll focus on this.
I'm not sure I follow, though, so let me try to verbalize my thought
process to see what I'm not understanding...

By "Suppress warnings on trusted repos", I assume this is done on the
hosting side? (Since I can't imagine a built-in Git feature that could
selectively trust repos.)

"scoped to a specific set of known trees" sounds like fsck.skipList
i.e. as a host, I can configure a list of "good" libgit2 trees that I
will trust and those will be skipped by fsck.

So from my _very_ naive reading of (2), it seems like we already have
all of the pieces in place for hosts to do (2) on their own, _unless_
we think that fsck.skipList is inadequate for this use case. e.g. I
personally can't imagine any way to list every "good" tree and still
have a cloneable fork of libgit2, so we might to teach fsck to do
something smarter like "skip any objects reachable by these commits".