Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 5 Dec 2023 09:05:17 -0400

On Tue, Dec 05, 2023 at 11:40:47AM +0000, Catalin Marinas wrote:
> > - Will had unanswered questions in another part of the thread:
> > 
> >   https://lore.kernel.org/all/20231013092954.GB13524@willie-the-truck/
> > 
> >   Can someone please help concluding it?
> 
> Is this about reclaiming the device? I think we concluded that we can't
> generalise this beyond PCIe, though not sure there was any formal
> statement to that thread. The other point Will had was around stating
> in the commit message why we only relax this to Normal NC. I haven't
> checked the commit message yet, it needs careful reading ;).

Not quite, we said reclaiming is VFIO's problem and if VFIO can't
reliably reclaim a device it shouldn't create it in the first place.

Again, I think alot of this is trying to take VFIO problems into KVM.

VFIO devices should not exist if they pose a harm to the system. If
VFIO decided to create the devices anyhow (eg admin override or
something) then it is not KVM's job to do any further enforcement.

Remember, the feedback we got from the CPU architects was that even
DEVICE_* will experience an uncontained failure if the device tiggers
an error response in shipping ARM IP.

The reason PCIe is safe is because the PCI bridge does not generate
errors in the first place!

Thus, the way a platform device can actually be safe is if it too
never generates errors in the first place! Obviously this approach
works just as well with NORMAL_NC.

If a platform device does generate errors then we shouldn't expect
containment at all, and the memory type has no bearing on the
safety. The correct answer is to block these platform devices from
VFIO/KVM/etc because they can trigger uncontained failures.

If you have learned something different then please share it..

IOW, what is the practical scenario where DEVICE_* has contained
errors but NORMAL_NC does not?

Jason