On 2012-04-08 20:18, Michael S. Tsirkin wrote: > On Sun, Apr 08, 2012 at 07:37:57PM +0200, Jan Kiszka wrote: >> On 2012-04-08 18:08, Avi Kivity wrote: >>> On 04/08/2012 07:04 PM, Michael S. Tsirkin wrote: >>>> On Sun, Apr 08, 2012 at 06:50:27PM +0300, Avi Kivity wrote: >>>>> On 04/08/2012 06:46 PM, Michael S. Tsirkin wrote: >>>>>>>>> >>>>>>>>> I'm thinking about this flow: >>>>>>>>> >>>>>>>>> FLR the device >>>>>>>>> for each emulated register >>>>>>>>> read it from the hardware >>>>>>>>> if different from emulated register: >>>>>>>>> update the internal model (for example, disabling MSI in kvm if >>>>>>>>> needed) >>>>>>>> >>>>>>>> If we do it this way we get back the problem this patch >>>>>>>> is trying to solve: MSIX assigned while device >>>>>>>> memory is disabled would cause unsupported request errors. >>>>>>> >>>>>>> Why is that? FLR would presumably disable MSI in the device, and this >>>>>>> line would disable it in kvm as well. >>>>>> >>>>>> The bug is that device memory is disabled (FLR would do that) >>>>>> while MSI is enabled in kvm. The fix is to >>>>>> disable MSI in kvm first. >>>>> >>>>> Yes, no need to repeat. My question is whether my pseudo-code does the >>>>> same >>>> >>>> It doesn't seem to: FLR (disabling memory) is followed >>>> by MSI disable in kvm instead of the reverse. >>> >>> Ah, so the problem is the ordering? I see. >>> >>>>> and whether or not if it is better (when applied to all emulated >>>>> config space). >>>> >>>> I'm not sure. >>>> I would like to see an example of a register that you have >>>> in mind. >>> >>> I went over the PCI registers and saw none that would be affected. >>> >>>>>> >>>>>> Yes. I'm talking about things like enabling memory, setting up irq register, >>>>>> etc though. Most of this setup is done by bios. >>>>> >>>>> I see. So should we have a pci_reset_function() variant that limits >>>>> itself to restoring just those bits? >>>> >>>> We only need kernel to restore whatever qemu emulates, but >>>> kernel doesn't know what that is. >>>> What kind of interface do you have in mind? >>>> >>> >>> The same as pci_reset_function(), but leaves MSI clear. >>> >>> I guess it's not worth it if the ordering problem is there. >> >> The core problem is not the ordering. The problem is that the kernel is >> susceptible to ordering mistakes of userspace. >> And that is because the >> kernel panics on PCI errors of devices that are in user hands - a >> critical kernel bug IMHO. > > I'm not sure. The pci sysfs interface > is by design not secured against malicious users, > isn't it? That's surely true for devices outside of IOMMU protection. But do we really have to give up when we encapsulate and isolate them that way? Provided we moderate access to the sysfs resources via libvirt or some other management service. > >> Proper reset of MSI or even the whole PCI >> config space is another issue, but one the kernel should not worry about >> - still, it should be fixed (therefore this patch). >> But even if we disallowed userland to disable MMIO and PIO access to the >> device, we would be be able to exclude that there are secrete channels >> in the device's interface having the same effect. > > I'm not sure I agree here. If there are secret channels to the device > that let it violate the PCI express spec, it can probably break the SRIOV > security model. And then you can do much more than just crash the host. Maybe, but there are also other devices. And if a guest reprograms it (firmware update...) and makes it stop reacting on requests, we may get the same effect. That would also be some kind of a "secrete channel". > >> So we likely need to >> enhance PCI error handling to catch and handle faults for certain >> devices differently - those we cannot trust to behave properly while >> they are under userland/guest control. >> >> Jan >> > > > I agree - forwarding errors to the guest would actually be very useful - but > I think we also need to analyse the problem carefully, > and prevent as many ways as we can for guest to cause trouble. If possible, the protection should target userspace which would automatically include guests. Only if that is not feasible with reasonable effort, we have to rely on QEMU to save the host. > > And there is another issue here: unsuppported request errors > should not cause kernel panics IMO. > > There's also the issue that qemu let guest control the MMIO/PIO > bits in the command register. > > So there are multiple bugs. > Yep, that's true. Jan
Attachment:
signature.asc
Description: OpenPGP digital signature