On 12/08/2014 06:12 AM, Alex Williamson wrote: > On Mon, 2014-12-08 at 04:58 +0000, Wu, Feng wrote: >> >>> -----Original Message----- >>> From: Eric Auger [mailto:eric.auger@xxxxxxxxxx] >>> Sent: Thursday, December 04, 2014 11:36 PM >>> To: Wu, Feng; tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; hpa@xxxxxxxxx; >>> x86@xxxxxxxxxx; gleb@xxxxxxxxxx; pbonzini@xxxxxxxxxx; >>> dwmw2@xxxxxxxxxxxxx; joro@xxxxxxxxxx; alex.williamson@xxxxxxxxxx; >>> jiang.liu@xxxxxxxxxxxxxxx >>> Cc: linux-kernel@xxxxxxxxxxxxxxx; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; >>> kvm@xxxxxxxxxxxxxxx >>> Subject: Re: [v2 18/25] KVM: kvm-vfio: implement the VFIO skeleton for VT-d >>> Posted-Interrupts >>> >>> Hi Feng, >>> >>> On 12/03/2014 08:39 AM, Feng Wu wrote: >>>> This patch adds the kvm-vfio interface for VT-d Posted-Interrrupts. >>>> When guests updates MSI/MSI-x information for an assigned-device, >>> update >>>> QEMU will use KVM_DEV_VFIO_DEVICE_POSTING_IRQ attribute to setup >>>> IRTE for VT-d PI. This patch implement this IRQ attribute. >>> s/implement/implements >>>> >>>> Signed-off-by: Feng Wu <feng.wu@xxxxxxxxx> >>>> --- >>>> include/linux/kvm_host.h | 19 ++++++++ >>>> virt/kvm/vfio.c | 103 >>> ++++++++++++++++++++++++++++++++++++++++++++++ >>>> 2 files changed, 122 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >>>> index 5cd4420..8d06678 100644 >>>> --- a/include/linux/kvm_host.h >>>> +++ b/include/linux/kvm_host.h >>>> @@ -1134,6 +1134,25 @@ static inline int >>> kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq, >>>> } >>>> #endif >>>> >>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POSTING >>>> +/* >>>> + * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts >>>> + * >>>> + * @kvm: kvm >>>> + * @host_irq: host irq of the interrupt >>>> + * @guest_irq: gsi of the interrupt >>>> + * returns 0 on success, < 0 on failure >>>> + */ >>>> +int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq, >>>> + uint32_t guest_irq); >>>> +#else >>>> +static int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int >>> host_irq, >>>> + uint32_t guest_irq) >>>> +{ >>>> + return 0; >>>> +} >>>> +#endif >>>> + >>>> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT >>>> >>>> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool >>> val) >>>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c >>>> index 6bc7001..5e5515f 100644 >>>> --- a/virt/kvm/vfio.c >>>> +++ b/virt/kvm/vfio.c >>>> @@ -446,6 +446,99 @@ out: >>>> return ret; >>>> } >>>> >>>> +static int kvm_vfio_pci_get_irq_count(struct pci_dev *pdev, int irq_type) >>>> +{ >>>> + if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { >>>> + u8 pin; >>>> + >>>> + pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin); >>>> + if (pin) >>>> + return 1; >>>> + } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) >>>> + return pci_msi_vec_count(pdev); >>>> + else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) >>>> + return pci_msix_vec_count(pdev); >>>> + >>>> + return 0; >>>> +} >>> for platform case I was asked to move the retrieval of absolute irq >>> number to the architecture specific part. I don't know if it should >>> apply to PCI stuff as well? This explains why I need to pass the VFIO >>> device (or struct device handle) to the arch specific part. Actually we >>> do the same job, we provide a phys/virt IRQ mapping to KVM, right? So to >>> me our architecture specific API should look quite similar? >> >> In my patch, QEMU passes IRQ type(MSI/MSIx in my case), VFIO device index, >> and sub-index via "struct kvm_vfio_dev_irq" to KVM, then KVM will find the >> real host irq from the VFIO device index and the IRQ type. Is this something >> similar with your patch? >> >>> >>>> + >>>> +static int kvm_vfio_set_pi(struct kvm_device *kdev, int32_t __user *argp) >>>> +{ >>>> + struct kvm_vfio_dev_irq pi_info; >>>> + uint32_t *gsi; >>>> + unsigned long minsz; >>>> + struct vfio_device *vdev; >>>> + struct msi_desc *entry; >>>> + struct device *dev; >>>> + struct pci_dev *pdev; >>>> + int i, max, ret; >>>> + >>>> + minsz = offsetofend(struct kvm_vfio_dev_irq, count); >>>> + >>>> + if (copy_from_user(&pi_info, (void __user *)argp, minsz)) >>>> + return -EFAULT; >>>> + >>>> + if (pi_info.argsz < minsz || pi_info.index >= VFIO_PCI_NUM_IRQS) >>> PCI specific check, same remark as above but I will let Alex further >>> comment on this and possibly invalidate this commeny ;-) >>>> + return -EINVAL; >>>> + >>>> + vdev = kvm_vfio_get_vfio_device(pi_info.fd); >>>> + if (IS_ERR(vdev)) >>>> + return PTR_ERR(vdev); >>>> + >>>> + dev = kvm_vfio_external_base_device(vdev); >>>> + if (!dev || !dev_is_pci(dev)) { >>>> + ret = -EFAULT; >>>> + goto put_vfio_device; >>>> + } >>>> + >>>> + pdev = to_pci_dev(dev); >>>> + >>>> + max = kvm_vfio_pci_get_irq_count(pdev, pi_info.index); >>>> + if (max <= 0) { >>>> + ret = -EFAULT; >>>> + goto put_vfio_device; >>>> + } >>>> + >>>> + if (pi_info.argsz - minsz < pi_info.count * sizeof(int) || >>> shouldn' we use the actual datatype? >> >> I am afraid I don't get this, could you please be more specific? Thanks a lot! > > We could have a platform that supports 64bit INTs. yes this is what I meant (struct datatype is __u32). Thanks Eric > >>>> + pi_info.start >= max || pi_info.start + pi_info.count > max) { >>>> + ret = -EINVAL; >>>> + goto put_vfio_device; >>>> + } >>>> + >>>> + gsi = memdup_user((void __user *)((unsigned long)argp + minsz), >>>> + pi_info.count * sizeof(int)); >>> same question as above >>>> + if (IS_ERR(gsi)) { >>>> + ret = PTR_ERR(gsi); >>>> + goto put_vfio_device; >>>> + } >>>> + >>>> +#ifdef CONFIG_PCI_MSI >>>> + for (i = 0; i < pi_info.count; i++) { >>>> + list_for_each_entry(entry, &pdev->msi_list, list) { >>>> + if (entry->msi_attrib.entry_nr != pi_info.start+i) >>>> + continue; >>>> + >>>> + ret = kvm_arch_vfio_update_pi_irte(kdev->kvm, >>>> + entry->irq, >>>> + gsi[i]); >>>> + if (ret) { >>>> + ret = -EFAULT; >>> why -EFAULT? and not propagation of original error code? >> Yes, you are right. Thanks for the comments! >> >>> you may have posting set for part of the subindexes and unset for rest. >>> Isn't it an issue? >> >> QEMU will always set the posting for all the sub-indexes for MSI/MSIx, >> once the guest updates the configuration of some sub-indexes, KVM will >> update it accordingly. So in which case will what you mentioned above >> happen? Was pointing out you handle the case where kvm_arch_vfio_update_pi_irte could fail and you still continue looping thru the other indexes. So theoretically you could have a mixed of non posted IRQs and posted IRQs. Best Regards Eric > > QEMU is just one userspace, not necessarily the only userspace. The > kernel shouldn't expect a specific userspace behavior. > >>>> + goto free_gsi; >>>> + } >>>> + } >>>> + } >>>> +#endif >>>> + >>>> + ret = 0; >>>> + >>>> +free_gsi: >>>> + kfree(gsi); >>>> + >>>> +put_vfio_device: >>>> + kvm_vfio_put_vfio_device(vdev); >>>> + return ret; >>>> +} >>>> + >>>> static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg) >>>> { >>>> int32_t __user *argp = (int32_t __user *)(unsigned long)arg; >>>> @@ -456,6 +549,11 @@ static int kvm_vfio_set_device(struct kvm_device >>> *kdev, long attr, u64 arg) >>>> case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: >>>> ret = kvm_vfio_control_irq_forward(kdev, attr, argp); >>>> break; >>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POSTING >>>> + case KVM_DEV_VFIO_DEVICE_POSTING_IRQ: >>>> + ret = kvm_vfio_set_pi(kdev, argp); >>>> + break; >>>> +#endif >>>> default: >>>> ret = -ENXIO; >>>> } >>>> @@ -511,6 +609,11 @@ static int kvm_vfio_has_attr(struct kvm_device >>> *dev, >>>> case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: >>>> return 0; >>>> #endif >>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POSTING >>>> + case KVM_DEV_VFIO_DEVICE_POSTING_IRQ: >>>> + return 0; >>>> +#endif >>>> + >>>> } >>>> break; >>>> } >>>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html