* Leon Romanovsky (leon@xxxxxxxxxx) wrote: > On Thu, Jan 26, 2023 at 05:48:33PM +0000, Reshetova, Elena wrote: > > > > > * Reshetova, Elena (elena.reshetova@xxxxxxxxx) wrote: > > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > > > > Replying only to the not-so-far addressed points. > > > > > > > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > > > > Hi Greg, > > > > > > > > > > <...> > > > > > > > > > > > > > 3) All the tools are open-source and everyone can start using them right > > > > > away > > > > > > > even > > > > > > > > without any special HW (readme has description of what is needed). > > > > > > > > Tools and documentation is here: > > > > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > > > > > > > Again, as our documentation states, when you submit patches based on > > > > > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > > > > > crazy and will get your patches rejected. You all know this, why ignore > > > > > > > it? > > > > > > > > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > > > > > we are submitting a fix that we have to list the way how it has been found. > > > > > > We will fix this in the future submissions, but some bugs we have are found > > > by > > > > > > plain code audit, so 'human' is the tool. > > > > > > > > > > My problem with that statement is that by applying different threat > > > > > model you "invent" bugs which didn't exist in a first place. > > > > > > > > > > For example, in this [1] latest submission, authors labeled correct > > > > > behaviour as "bug". > > > > > > > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > > > > alexander.shishkin@xxxxxxxxxxxxxxx/ > > > > > > > > Hm.. Does everyone think that when kernel dies with unhandled page fault > > > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in > > > some > > > > other cases we already have fixes or investigating) it represents a correct > > > behavior even if > > > > you expect that all your pci HW devices are trusted? What about an error in > > > two > > > > consequent pci reads? What about just some failure that results in erroneous > > > input? > > > > > > I'm not sure you'll get general agreement on those answers for all > > > devices and situations; I think for most devices for non-CoCo > > > situations, then people are generally OK with a misbehaving PCI device > > > causing a kernel crash, since most people are running without IOMMU > > > anyway, a misbehaving device can cause otherwise undetectable chaos. > > > > Ok, if this is a consensus within the kernel community, then we can consider > > the fixes strictly from the CoCo threat model point of view. > > > > > > > > I'd say: > > > a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't > > > guarantee forward progress or stop the hypervisor doing something > > > truly stupid. > > > > Yes, denial of service is out of scope but I would not pile all crashes as > > 'safe' automatically. Depending on the crash, it can be used as a > > primitive to launch further attacks: privilege escalation, information > > disclosure and corruption. It is especially true for memory corruption > > issues. > > > > > b) For CoCo, information disclosure, or corruption IS a problem > > > > Agreed, but the path to this can incorporate a number of attack > > primitives, as well as bug chaining. So, if the bug is detected, and > > fix is easy, instead of thinking about possible implications and its > > potential usage in exploit writing, safer to fix it. > > > > > > > > c) For non-CoCo some people might care about robustness of the kernel > > > against a failing PCI device, but generally I think they worry about > > > a fairly clean failure, even in the unexpected-hot unplug case. > > > > Ok. > > With my other hat as a representative of hardware vendor (at least for > NIC part), who cares about quality of our devices, we don't want to hide > ANY crash related to our devices, especially if it is related to misbehaving > PCI HW logic. Any uncontrolled "robustness" hides real issues and makes > QA/customer support much harder. Yeh if you're adding new code to be more careful, you want the code to fail/log the problem, not hide it. (Although heck, I suspect there are a million apparently working PCI cards out there that break some spec somewhere). Dave > Thanks > -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK