On 2012-02-28 00:15, Alex Williamson wrote: > On Mon, 2012-02-27 at 23:07 +0100, Jan Kiszka wrote: >> On 2012-02-27 22:05, Alex Williamson wrote: >>> On Fri, 2012-02-10 at 19:17 +0100, Jan Kiszka wrote: >>>> PCI 2.3 allows to generically disable IRQ sources at device level. This >>>> enables us to share legacy IRQs of such devices with other host devices >>>> when passing them to a guest. >>>> >>>> The new IRQ sharing feature introduced here is optional, user space has >>>> to request it explicitly. Moreover, user space can inform us about its >>>> view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the >>>> interrupt and signaling it if the guest masked it via the virtualized >>>> PCI config space. >>>> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> >>>> --- >>>> >>>> Changes in v3: >>>> - rebased over current kvm.git (no code conflict, just api.txt) >>>> >>>> Documentation/virtual/kvm/api.txt | 31 ++++++ >>>> arch/x86/kvm/x86.c | 1 + >>>> include/linux/kvm.h | 6 + >>>> include/linux/kvm_host.h | 2 + >>>> virt/kvm/assigned-dev.c | 208 +++++++++++++++++++++++++++++++----- >>>> 5 files changed, 219 insertions(+), 29 deletions(-) >>>> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt >>>> index 59a3826..5ce0e29 100644 >>>> --- a/Documentation/virtual/kvm/api.txt >>>> +++ b/Documentation/virtual/kvm/api.txt >>>> @@ -1169,6 +1169,14 @@ following flags are specified: >>>> >>>> /* Depends on KVM_CAP_IOMMU */ >>>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) >>>> +/* The following two depend on KVM_CAP_PCI_2_3 */ >>>> +#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >>>> +#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2) >>>> + >>>> +If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx interrupts >>>> +via the PCI-2.3-compliant device-level mask, thus enable IRQ sharing with other >>>> +assigned devices or host devices. KVM_DEV_ASSIGN_MASK_INTX specifies the >>>> +guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for details. >>>> >>>> The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure >>>> isolation of the device. Usages not specifying this flag are deprecated. >>>> @@ -1441,6 +1449,29 @@ The "num_dirty" field is a performance hint for KVM to determine whether it >>>> should skip processing the bitmap and just invalidate everything. It must >>>> be set to the number of set bits in the bitmap. >>>> >>>> +4.60 KVM_ASSIGN_SET_INTX_MASK >>>> + >>>> +Capability: KVM_CAP_PCI_2_3 >>>> +Architectures: x86 >>>> +Type: vm ioctl >>>> +Parameters: struct kvm_assigned_pci_dev (in) >>>> +Returns: 0 on success, -1 on error >>>> + >>>> +Informs the kernel about the guest's view on the INTx mask. As long as the >>>> +guest masks the legacy INTx, the kernel will refrain from unmasking it at >>>> +hardware level and will not assert the guest's IRQ line. User space is still >>>> +responsible for applying this state to the assigned device's real config space >>>> +by setting or clearing the Interrupt Disable bit 10 in the Command register. >>>> + >>>> +To avoid that the kernel overwrites the state user space wants to set, >>>> +KVM_ASSIGN_SET_INTX_MASK has to be called prior to updating the config space. >>>> +Moreover, user space has to write back its own view on the Interrupt Disable >>>> +bit whenever modifying the Command word. >>> >>> This is very confusing. I think we need to work on the wording, but >>> perhaps it's not worth hold up the patch. It seems the simplest, >> >> As I need another round anyway (see below), I'm open for better wording >> suggestions. > > Now that I know what it does, I'll probably write something just as > confusing, but here's a shot: > > Allows userspace to mask PCI INTx interrupts from the assigned > device. The kernel will not deliver INTx interrupts to the > guest between setting and clearing of KVM_ASSIGN_SET_INTX_MASK > via this interface. This enables use of and emulation of PCI > 2.3 INTx disable command register behavior. > > This may be used for both PCI 2.3 devices supporting INTx > disable natively and older devices lacking this support. > Userspace is responsible for emulating the read value of the > INTx disable bit in the guest visible PCI command register. > When modifying the INTx disable state, userspace should precede > updating the physical device command register by calling this > ioctl to inform the kernel of the new intended INTx mask state. > > Note that the kernel uses the device INTx disable bit to > internally manage the device interrupt state for PCI 2.3 > devices. Reads of this register may therefore not match the > expected value. Writes should always use the guest intended > INTx disable value rather than attempting to read-copy-update > the current physical device state. Races between user and > kernel updates to the INTx disable bit are handled lazily in the > kernel. It's possible the device may generate unintended > interrupts, but they will not be injected into the guest. Sounds good, will pick it up. ... >>>> @@ -759,6 +851,56 @@ msix_entry_out: >>>> } >>>> #endif >>>> >>>> +static int kvm_vm_ioctl_set_pci_irq_mask(struct kvm *kvm, >>>> + struct kvm_assigned_pci_dev *assigned_dev) >>>> +{ >>>> + int r = 0; >>>> + struct kvm_assigned_dev_kernel *match; >>>> + >>>> + mutex_lock(&kvm->lock); >>>> + >>>> + match = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head, >>>> + assigned_dev->assigned_dev_id); >>>> + if (!match) { >>>> + r = -ENODEV; >>>> + goto out; >>>> + } >>>> + >>>> + mutex_lock(&match->intx_mask_lock); >>>> + >>>> + match->flags &= ~KVM_DEV_ASSIGN_MASK_INTX; >>>> + match->flags |= assigned_dev->flags & KVM_DEV_ASSIGN_MASK_INTX; >>>> + >>>> + if (match->irq_requested_type & KVM_DEV_IRQ_GUEST_INTX) { >>>> + if (assigned_dev->flags & KVM_DEV_ASSIGN_MASK_INTX) { >>>> + kvm_set_irq(match->kvm, match->irq_source_id, >>>> + match->guest_irq, 0); >>>> + /* >>>> + * Masking at hardware-level is performed on demand, >>>> + * i.e. when an IRQ actually arrives at the host. >>>> + */ >>>> + } else { >>>> + /* >>>> + * Unmask the IRQ line. It may have been masked >>>> + * meanwhile if we aren't using PCI 2.3 INTx masking >>>> + * on the host side. >>>> + */ >>>> + spin_lock_irq(&match->intx_lock); >>>> + if (match->host_irq_disabled) { >>>> + enable_irq(match->host_irq); >>> >>> How do we not get an unbalanced enable here for PCI 2.3 devices? >> >> By performing both the disable and the host_irq_disabled update under >> intx_lock? Or which scenario do you see? > > Do we ever do disable_irq() on a PCI 2.3 device? Seems like we only use > the intx API for them, and disable/enable_irq exclusively on non-2.3, so > if we can get here on a 2.3 device we have unbalanced enables. Am I > missing some nuance of host_irq_disabled that prevents 2.3 from getting > here? Thanks, OK, now I got it. Yes, that's indeed a bug. I dug in an older version of this patch, and there I had multiple state values to encode if the line or the device was disabled. Will limit to pre-2.3 devices. Thanks, Jan
Attachment:
signature.asc
Description: OpenPGP digital signature