Re: [PATCH v6 0/5] config: introduce discovery.bare and protected config

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 12 2022, Glen Choo wrote:

> Thanks for following up. I'm a concerned that this thread will be
> unproductive if all we're doing is reiterating our own opinions. I'm ok
> if the conclusion is "agree to disagree", but let's not spend too much
> time talking circles around one another (myself included, of course:)).

Yes, I have not been following up here to merely repeat what's been said
before, but...

> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>> On Fri, Jul 01 2022, Glen Choo wrote:
>>>> The "more narrow" and "more secure" go hand-in-hand, since if you work
>>>> on such servers you'd turn this to "always" because you want to read
>>>> such config, but then be left vulnerable to the actual (and muche rarer)
>>>> exploit we're trying to prevent.
>>>
>>> The point that we're not defending bare repo users is fair, but maybe
>>> the group we're trying to protect isn't really dedicated Git-serving
>>> servers. This exploit requires you to have a bare repo inside the
>>> working tree of a non-bare repo. So I think this is less of an issue for
>>> a server, and more for "mixed-use" environments with both regular and
>>> bare clones.
>>
>> Yes, but this is only something that's even a question because of an
>> artificial limitation your proposal here suffers from.
>>
>> I.e. in trying to detect nefarious repos where you've got "looks like
>> bare" content *tracked* in another repo you're conflating it with *any
>> bare repo*.
>>
>> And the only reason we're doing that seems to me to be a premature
>> optimization.
>
> Right, I hear you. Besides performance,[...]

...have been following up because it's still genuinely unclear to me
what data or design constraints led to this solution. I.e. in [1] you
noted ("[...]" interjection is mine):

	"I don't see how we could implement this [the "walk-up" method]
	without imposing a big penalty to all bare repo users[...]."

[Continued below]

> let me offer the perspective
> that I should have led with in the previous email. In this thread and
> the original "embedded bare repo" one [1], there is a huge diversity of
> opinion on what the default behavior should be, e.g.:

I read that thread over again, and some of the highlights were:

 * brian asking if we can't basically do the "walk up" method:
   https://lore.kernel.org/git/Yk9hONuCIVIq6ieV@xxxxxxxxxxxxxxxxxxxxxxxxx/

 * Taylor wondering how much we need to worry about this attack (among
   other things) & worrying about legitimate "bare repo" workflows being
   broken: https://lore.kernel.org/git/YloTQH35r2xVdPm1@nand.local/ &
   https://lore.kernel.org/git/Ylobp7sntKeWTLDX@nand.local/

But most importantly, here's something I hadn't noticed before:

 * Emily talking about the supposed slowness of the "walk up" method:
   https://lore.kernel.org/git/CAJoAoZkgnnvdymuBsM9Ja3+eYSnyohr=FQZMVX_uzZ_pkQhgaw@xxxxxxxxxxxxxx/

I.e.:

	"wantonly scanning up the filesystem for any gitdir above the
	current one is really expensive. When I tried that approach for
	the purposes of including some shared config between
	superproject and submodules, it slowed down the Git test suite
	by something like 3-5x."

Which I'm now 99.99% certain based on past context[2] is a misstatement
or misrecollection about an early version of
submodule.superprojectGitDir v.s. what setup.c would do.

I.e. that 3-5x slowness referred to git-submodule.sh shelling out to
"git rev-parse", it's not a reference to the expense of the few syscalls
we'd need to make to discover a parent git directory.

Did you hear about the directory walking being a performance concern
from Emily, or was it an independent discovery?

It seems as though this might have come about because of a
misrecollection about the git-rev-parse(1)/git-submodule.sh v.s. setup.c
performance with reference to submodule.superprojectGitDir, and that
we've now got a design that's optimized to avoid a performance problem
that doesn't exist, at the cost of accuracy.

And not to reiterate, but I think the performance isn't a concern
per-se, but rather that performance concerns seem to have driven one
design over another.

> - How do we detect an embedded bare repo (fsck check? walk up [and check
>   if it's tracked]?)
> - What to do when we detect one (ignore the config? block the repo?)
> - How to preserve workflows that rely on embedded bare repos (some kind
>   of (global|per-repo) exception list? allow the repo but not the
>   config?)
>
> And rightfully so! There are a lot of options here, so we want to make
> sure we get the defaults right. But at the same time that implies a
> pretty slow, difficult process.

I saw some implementation discussion about how we'd do this with fsck,
which is one thing, but I don't really see the trickyness or ambiguity
on the client side.

I.e. we know when we'd "find a repo", so that's the criteria we'd use to
ignore such a contained repo or not. The only trickyness seems to come
about if the approach we pick is one where we conflate embedded bare
repos v.s. non-embedded bare repos.

> On the other hand, I haven't seen nearly as much disagreement on "just
> refuse to work with bare repos" because it's so restrictive that it
> probably won't be the default. So it'll have no effect on most users,
> but still confers protection for the subset of users who can benefit
> from it. For those who want the problem fixed _today_ (e.g. my
> employer), this seems like simple, low-hanging fruit that buys time for
> us to find good default.
>
> FWIW, when time permits I'd be happy to work on that good default (which
> will probably be some variant of "walk up"), and to pay off the tech
> debt introduced by this implementation (I have some ideas about how we
> could improve the config API to achieve this [2]). Hopefully that helps
> allay some of your concerns?

It really just seems like a dead end to me, sorry.

I.e. we know what the security problem is, but the side-effects of this
approach are such that we'll probably never turn it on by default.

So that'll mean that the vast majority of users who could benefit from
the security mitigation won't even know about the config, or if they do
might not have it turned on.

And yes, we might end up with a better design later, but then we'll have
to still support this config mechanism, potentially deprecate it etc.

> [1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> [2] https://lore.kernel.org/git/kl6lr13fi9qn.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

1. https://lore.kernel.org/git/kl6lee1z8mcm.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
2. https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@xxxxxxxxxxxxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux