On Sun, Apr 08, 2012 at 08:39:35PM +0200, Jan Kiszka wrote: > On 2012-04-08 20:18, Michael S. Tsirkin wrote: > > On Sun, Apr 08, 2012 at 07:37:57PM +0200, Jan Kiszka wrote: > >> On 2012-04-08 18:08, Avi Kivity wrote: > >>> On 04/08/2012 07:04 PM, Michael S. Tsirkin wrote: > >>>> On Sun, Apr 08, 2012 at 06:50:27PM +0300, Avi Kivity wrote: > >>>>> On 04/08/2012 06:46 PM, Michael S. Tsirkin wrote: > >>>>>>>>> > >>>>>>>>> I'm thinking about this flow: > >>>>>>>>> > >>>>>>>>> FLR the device > >>>>>>>>> for each emulated register > >>>>>>>>> read it from the hardware > >>>>>>>>> if different from emulated register: > >>>>>>>>> update the internal model (for example, disabling MSI in kvm if > >>>>>>>>> needed) > >>>>>>>> > >>>>>>>> If we do it this way we get back the problem this patch > >>>>>>>> is trying to solve: MSIX assigned while device > >>>>>>>> memory is disabled would cause unsupported request errors. > >>>>>>> > >>>>>>> Why is that? FLR would presumably disable MSI in the device, and this > >>>>>>> line would disable it in kvm as well. > >>>>>> > >>>>>> The bug is that device memory is disabled (FLR would do that) > >>>>>> while MSI is enabled in kvm. The fix is to > >>>>>> disable MSI in kvm first. > >>>>> > >>>>> Yes, no need to repeat. My question is whether my pseudo-code does the > >>>>> same > >>>> > >>>> It doesn't seem to: FLR (disabling memory) is followed > >>>> by MSI disable in kvm instead of the reverse. > >>> > >>> Ah, so the problem is the ordering? I see. > >>> > >>>>> and whether or not if it is better (when applied to all emulated > >>>>> config space). > >>>> > >>>> I'm not sure. > >>>> I would like to see an example of a register that you have > >>>> in mind. > >>> > >>> I went over the PCI registers and saw none that would be affected. > >>> > >>>>>> > >>>>>> Yes. I'm talking about things like enabling memory, setting up irq register, > >>>>>> etc though. Most of this setup is done by bios. > >>>>> > >>>>> I see. So should we have a pci_reset_function() variant that limits > >>>>> itself to restoring just those bits? > >>>> > >>>> We only need kernel to restore whatever qemu emulates, but > >>>> kernel doesn't know what that is. > >>>> What kind of interface do you have in mind? > >>>> > >>> > >>> The same as pci_reset_function(), but leaves MSI clear. > >>> > >>> I guess it's not worth it if the ordering problem is there. > >> > >> The core problem is not the ordering. The problem is that the kernel is > >> susceptible to ordering mistakes of userspace. > >> And that is because the > >> kernel panics on PCI errors of devices that are in user hands - a > >> critical kernel bug IMHO. > > > > I'm not sure. The pci sysfs interface > > is by design not secured against malicious users, > > isn't it? > > That's surely true for devices outside of IOMMU protection. But do we > really have to give up when we encapsulate and isolate them that way? > Provided we moderate access to the sysfs resources via libvirt or some > other management service. We don't have to give up but we'd have to build such an interface: /config attribute is not it. > > > >> Proper reset of MSI or even the whole PCI > >> config space is another issue, but one the kernel should not worry about > >> - still, it should be fixed (therefore this patch). > >> But even if we disallowed userland to disable MMIO and PIO access to the > >> device, we would be be able to exclude that there are secrete channels > >> in the device's interface having the same effect. > > > > I'm not sure I agree here. If there are secret channels to the device > > that let it violate the PCI express spec, it can probably break the SRIOV > > security model. And then you can do much more than just crash the host. > > Maybe, but there are also other devices. And if a guest reprograms it > (firmware update...) and makes it stop reacting on requests, we may get > the same effect. That would also be some kind of a "secrete channel". Right. So it looks like SRIOV VF is the only type of device that is safe to assign to a guest: Presumably, SRIOV VFs don't let driver program the firmware. And I think SRIOV VFs don't have MMIO/PIO enable bits either, and the BAR isn't programmable through the VF... > > > >> So we likely need to > >> enhance PCI error handling to catch and handle faults for certain > >> devices differently - those we cannot trust to behave properly while > >> they are under userland/guest control. > >> > >> Jan > >> > > > > > > I agree - forwarding errors to the guest would actually be very useful - but > > I think we also need to analyse the problem carefully, > > and prevent as many ways as we can for guest to cause trouble. > > If possible, the protection should target userspace which would > automatically include guests. Only if that is not feasible with > reasonable effort, we have to rely on QEMU to save the host. Defence in depth is best, right? > > > > And there is another issue here: unsuppported request errors > > should not cause kernel panics IMO. > > > > There's also the issue that qemu let guest control the MMIO/PIO > > bits in the command register. > > > > So there are multiple bugs. > > > > Yep, that's true. > > Jan > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html