Re: Linux guest kernel threat model for Confidential Computing

"Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> · Thu, 26 Jan 2023 18:14:28 +0000

* Leon Romanovsky (leon@xxxxxxxxxx) wrote:
> On Thu, Jan 26, 2023 at 05:48:33PM +0000, Reshetova, Elena wrote:
> > 
> > > * Reshetova, Elena (elena.reshetova@xxxxxxxxx) wrote:
> > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > > Replying only to the not-so-far addressed points.
> > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > > > Hi Greg,
> > > > >
> > > > > <...>
> > > > >
> > > > > > > > 3) All the tools are open-source and everyone can start using them right
> > > > > away
> > > > > > > even
> > > > > > > > without any special HW (readme has description of what is needed).
> > > > > > > > Tools and documentation is here:
> > > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > > >
> > > > > > > Again, as our documentation states, when you submit patches based on
> > > > > > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > > > > > crazy and will get your patches rejected.  You all know this, why ignore
> > > > > > > it?
> > > > > >
> > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > > > we are submitting a fix that we have to list the way how it has been found.
> > > > > > We will fix this in the future submissions, but some bugs we have are found
> > > by
> > > > > > plain code audit, so 'human' is the tool.
> > > > >
> > > > > My problem with that statement is that by applying different threat
> > > > > model you "invent" bugs which didn't exist in a first place.
> > > > >
> > > > > For example, in this [1] latest submission, authors labeled correct
> > > > > behaviour as "bug".
> > > > >
> > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > > alexander.shishkin@xxxxxxxxxxxxxxx/
> > > >
> > > > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> > > some
> > > > other cases we already have fixes or investigating) it represents a correct
> > > behavior even if
> > > > you expect that all your pci HW devices are trusted? What about an error in
> > > two
> > > > consequent pci reads? What about just some failure that results in erroneous
> > > input?
> > > 
> > > I'm not sure you'll get general agreement on those answers for all
> > > devices and situations; I think for most devices for non-CoCo
> > > situations, then people are generally OK with a misbehaving PCI device
> > > causing a kernel crash, since most people are running without IOMMU
> > > anyway, a misbehaving device can cause otherwise undetectable chaos.
> > 
> > Ok, if this is a consensus within the kernel community, then we can consider
> > the fixes strictly from the CoCo threat model point of view. 
> > 
> > > 
> > > I'd say:
> > >   a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
> > >   guarantee forward progress or stop the hypervisor doing something
> > >   truly stupid.
> > 
> > Yes, denial of service is out of scope but I would not pile all crashes as
> > 'safe' automatically. Depending on the crash, it can be used as a
> > primitive to launch further attacks: privilege escalation, information
> > disclosure and corruption. It is especially true for memory corruption
> > issues. 
> > 
> > >   b) For CoCo, information disclosure, or corruption IS a problem
> > 
> > Agreed, but the path to this can incorporate a number of attack 
> > primitives, as well as bug chaining. So, if the bug is detected, and
> > fix is easy, instead of thinking about possible implications and its 
> > potential usage in exploit writing, safer to fix it.
> > 
> > > 
> > >   c) For non-CoCo some people might care about robustness of the kernel
> > >   against a failing PCI device, but generally I think they worry about
> > >   a fairly clean failure, even in the unexpected-hot unplug case.
> > 
> > Ok.
> 
> With my other hat as a representative of hardware vendor (at least for
> NIC part), who cares about quality of our devices, we don't want to hide
> ANY crash related to our devices, especially if it is related to misbehaving
> PCI HW logic. Any uncontrolled "robustness" hides real issues and makes
> QA/customer support much harder.

Yeh if you're adding new code to be more careful, you want the code to
fail/log the problem, not hide it.
(Although heck, I suspect there are a million apparently working PCI
cards out there that break some spec somewhere).

Dave

> Thanks
> 
-- 
Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK