On Tue, Dec 05, 2023 at 11:40:47AM +0000, Catalin Marinas wrote: > > - Will had unanswered questions in another part of the thread: > > > > https://lore.kernel.org/all/20231013092954.GB13524@willie-the-truck/ > > > > Can someone please help concluding it? > > Is this about reclaiming the device? I think we concluded that we can't > generalise this beyond PCIe, though not sure there was any formal > statement to that thread. The other point Will had was around stating > in the commit message why we only relax this to Normal NC. I haven't > checked the commit message yet, it needs careful reading ;). Not quite, we said reclaiming is VFIO's problem and if VFIO can't reliably reclaim a device it shouldn't create it in the first place. Again, I think alot of this is trying to take VFIO problems into KVM. VFIO devices should not exist if they pose a harm to the system. If VFIO decided to create the devices anyhow (eg admin override or something) then it is not KVM's job to do any further enforcement. Remember, the feedback we got from the CPU architects was that even DEVICE_* will experience an uncontained failure if the device tiggers an error response in shipping ARM IP. The reason PCIe is safe is because the PCI bridge does not generate errors in the first place! Thus, the way a platform device can actually be safe is if it too never generates errors in the first place! Obviously this approach works just as well with NORMAL_NC. If a platform device does generate errors then we shouldn't expect containment at all, and the memory type has no bearing on the safety. The correct answer is to block these platform devices from VFIO/KVM/etc because they can trigger uncontained failures. If you have learned something different then please share it.. IOW, what is the practical scenario where DEVICE_* has contained errors but NORMAL_NC does not? Jason