On 10/18/2016 10:59 PM, Alex Williamson wrote: > On Tue, 18 Oct 2016 20:38:21 +0800 > Jike Song <jike.song@xxxxxxxxx> wrote: >> On 10/18/2016 12:02 AM, Alex Williamson wrote: >>> On Fri, 14 Oct 2016 15:19:01 -0700 >>> Neo Jia <cjia@xxxxxxxxxx> wrote: >>> >>>> On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote: >>>>> On Fri, 14 Oct 2016 09:35:45 -0700 >>>>> Neo Jia <cjia@xxxxxxxxxx> wrote: >>>>> >>>>>> On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote: >>>>>>> On Fri, 14 Oct 2016 08:41:58 -0600 >>>>>>> Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: >>>>>>> >>>>>>>> On Fri, 14 Oct 2016 18:37:45 +0800 >>>>>>>> Jike Song <jike.song@xxxxxxxxx> wrote: >>>>>>>> >>>>>>>>> On 10/11/2016 05:47 PM, Paolo Bonzini wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 11/10/2016 11:21, Xiao Guangrong wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 10/11/2016 04:54 PM, Paolo Bonzini wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 11/10/2016 04:39, Xiao Guangrong wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 10/10/2016 20:01, Neo Jia wrote: >>>>>>>>>>>>>>>> Hi Neo, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT, >>>>>>>>>>>>>>>> while nVidia does. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Paolo and Xiaoguang, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am just wondering how device driver can register a notifier so he >>>>>>>>>>>>>>> can be >>>>>>>>>>>>>>> notified for write-protected pages when writes are happening. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It can't yet, but the API is ready for that. kvm_vfio_set_group is >>>>>>>>>>>>>> currently where a struct kvm_device* and struct vfio_group* touch. >>>>>>>>>>>>>> Given >>>>>>>>>>>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to >>>>>>>>>>>>>> kvm_page_track_register_notifier. So I guess you could add a callback >>>>>>>>>>>>>> that passes the struct kvm_device* to the mdev device. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Xiaoguang and Guangrong, what were your plans? We discussed it briefly >>>>>>>>>>>>>> at KVM Forum but I don't remember the details. >>>>>>>>>>>>> >>>>>>>>>>>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can >>>>>>>>>>>>> figure out the kvm instance based on the fd. >>>>>>>>>>>>> >>>>>>>>>>>>> We got a new idea, how about search the kvm instance by mm_struct, it >>>>>>>>>>>>> can work as KVMGT is running in the vcpu context and it is much more >>>>>>>>>>>>> straightforward. >>>>>>>>>>>> >>>>>>>>>>>> Perhaps I didn't understand your suggestion, but the same mm_struct can >>>>>>>>>>>> have more than 1 struct kvm so I'm not sure that it can work. >>>>>>>>>>> >>>>>>>>>>> vcpu->pid is valid during vcpu running so that it can be used to figure >>>>>>>>>>> out which kvm instance owns the vcpu whose pid is the one as current >>>>>>>>>>> thread, i think it can work. :) >>>>>>>>>> >>>>>>>>>> No, don't do that. There's no reason for a thread to run a single VCPU, >>>>>>>>>> and if you can have multiple VCPUs you can also have multiple VCPUs from >>>>>>>>>> multiple VMs. >>>>>>>>>> >>>>>>>>>> Passing file descriptors around are the right way to connect subsystems. >>>>>>>>> >>>>>>>>> [CC Alex, Kevin and Qemu-devel] >>>>>>>>> >>>>>>>>> Hi Paolo & Alex, >>>>>>>>> >>>>>>>>> IIUC, passing file descriptors means touching QEMU and the UAPI between >>>>>>>>> QEMU and VFIO. Would you guys have a look at below draft patch? If it's >>>>>>>>> on the correct direction, I'll send the split ones. Thanks! >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks, >>>>>>>>> Jike >>>>>>>>> >>>>>>>>> >>>>>>>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c >>>>>>>>> index bec694c..f715d37 100644 >>>>>>>>> --- a/hw/vfio/pci-quirks.c >>>>>>>>> +++ b/hw/vfio/pci-quirks.c >>>>>>>>> @@ -10,12 +10,14 @@ >>>>>>>>> * the COPYING file in the top-level directory. >>>>>>>>> */ >>>>>>>>> >>>>>>>>> +#include <sys/ioctl.h> >>>>>>>>> #include "qemu/osdep.h" >>>>>>>>> #include "qemu/error-report.h" >>>>>>>>> #include "qemu/range.h" >>>>>>>>> #include "qapi/error.h" >>>>>>>>> #include "hw/nvram/fw_cfg.h" >>>>>>>>> #include "pci.h" >>>>>>>>> +#include "sysemu/kvm.h" >>>>>>>>> #include "trace.h" >>>>>>>>> >>>>>>>>> /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */ >>>>>>>>> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev) >>>>>>>>> break; >>>>>>>>> } >>>>>>>>> } >>>>>>>>> + >>>>>>>>> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev) >>>>>>>>> +{ >>>>>>>>> + int vmfd; >>>>>>>>> + >>>>>>>>> + if (!kvm_enabled() || !vdev->kvmgt) >>>>>>>>> + return; >>>>>>>>> + >>>>>>>>> + /* Tell the device what KVM it attached */ >>>>>>>>> + vmfd = kvm_get_vmfd(kvm_state); >>>>>>>>> + ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd); >>>>>>>>> +} >>>>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >>>>>>>>> index a5a620a..8732552 100644 >>>>>>>>> --- a/hw/vfio/pci.c >>>>>>>>> +++ b/hw/vfio/pci.c >>>>>>>>> @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev) >>>>>>>>> return ret; >>>>>>>>> } >>>>>>>>> >>>>>>>>> + vfio_quirk_kvmgt(vdev); >>>>>>>>> + >>>>>>>>> /* Get a copy of config space */ >>>>>>>>> ret = pread(vdev->vbasedev.fd, vdev->pdev.config, >>>>>>>>> MIN(pci_config_size(&vdev->pdev), vdev->config_size), >>>>>>>>> @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = { >>>>>>>>> DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice, >>>>>>>>> sub_device_id, PCI_ANY_ID), >>>>>>>>> DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0), >>>>>>>>> + DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false), >>>>>>>> >>>>>>>> Just a side note, device options are a headache, users are prone to get >>>>>>>> them wrong and minimally it requires an entire round to get libvirt >>>>>>>> support. We should be able to detect from the device or vfio API >>>>>>>> whether such a call is required. Obviously if we can use the existing >>>>>>>> kvm-vfio device, that's the better option anyway. Thanks, >>>>>>> >>>>>>> Also, vfio devices currently have no hard dependencies on KVM, if kvmgt >>>>>>> does, it needs to produce a device failure when unavailable. Thanks, >>>>>> >>>>>> Also, I would like to see this as an generic feature instead of >>>>>> kvmgt specific interface, so we don't have to add new options to QEMU and it is >>>>>> up to the vendor driver to proceed with or without it. >>>>> >>>>> In general this should be decided by lack of some required feature >>>>> exclusively provided by KVM. I would not want to add a generic opt-out >>>>> for mdev vendor drivers to decide that they arbitrarily want to disable >>>>> that path. Thanks, >>>> >>>> IIUC, you are suggesting that this path should be controlled by KVM feature cap >>>> and it will be accessible to VFIO users when such checking is satisfied. >>> >>> Maybe we're getting too loose with our pronouns here, I'm starting to >>> lose track of what "this" is referring to. I agree that there's no >>> reason for the ioctl, as proposed to be kvmgt specific. I would hope >>> that going through the kvm-vfio device to create that linkage would >>> eliminate that, but we'll need to see what Jike can come up with to >>> plumb between KVM and vfio. Vendor drivers can implement their own >>> ioctls, now that we pass them through the mdev layer, but someone needs >>> to call those ioctls. Ideally we want something programmatic to >>> trigger that, without requiring a user to pass an extra device >>> parameter. Additionally, if there is any hope of making use of the >>> device with userspace drivers other than QEMU, hard dependencies on KVM >>> should be avoided. Thanks, >>> >>> Alex >>> >> >> Thanks for the advice, so I cooked another patch for your comments. >> Basically a 'void *usrdata' is added to vfio_group, external users >> can set it (kvm) or get it (kvm or other users like kvmgt). >> >> BTW, in device-model, the open method will return failure to vfio-mdev >> in case that such kvm information is not available. >> >> -- >> Thanks, >> Jike >> >> >> >> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c >> index d1d70e0..6b8d1d2 100644 >> --- a/drivers/vfio/vfio.c >> +++ b/drivers/vfio/vfio.c >> @@ -86,6 +86,7 @@ struct vfio_group { >> struct mutex unbound_lock; >> atomic_t opened; >> bool noiommu; >> + void *usrdata; >> }; >> >> struct vfio_device { >> @@ -447,14 +448,13 @@ static struct vfio_group *vfio_group_try_get(struct vfio_group *group) >> } >> >> static >> -struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group) >> +struct vfio_group *__vfio_group_get_from_iommu(struct iommu_group *iommu_group) >> { >> struct vfio_group *group; >> >> mutex_lock(&vfio.group_lock); >> list_for_each_entry(group, &vfio.group_list, vfio_next) { >> if (group->iommu_group == iommu_group) { >> - vfio_group_get(group); > > This is wrong, we can't add our reference after we release the lock. > Thanks for pointing it out :) >> mutex_unlock(&vfio.group_lock); >> return group; >> } >> @@ -464,6 +464,17 @@ struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group) >> return NULL; >> } >> >> +static >> +struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group) >> +{ >> + struct vfio_group *group = __vfio_group_get_from_iommu(iommu_group); >> + if (!group) >> + return NULL; >> + >> + vfio_group_get(group); > > We have no basis to get a reference here. This function cannot exist > separate from the existing function above. > >> + return group; >> +} >> + >> static struct vfio_group *vfio_group_get_from_minor(int minor) >> { >> struct vfio_group *group; >> @@ -1728,6 +1739,31 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg) >> } >> EXPORT_SYMBOL_GPL(vfio_external_check_extension); >> >> +void vfio_group_set_usrdata(struct vfio_group *group, void *data) >> +{ >> + group->usrdata = data; >> +} >> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata); >> + >> +void *vfio_group_get_usrdata(struct vfio_group *group) >> +{ >> + return group->usrdata; >> +} >> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata); >> + >> +void *vfio_group_get_usrdata_by_device(struct device *dev) >> +{ >> + struct vfio_group *vfio_group; >> + >> + vfio_group = __vfio_group_get_from_iommu(dev->iommu_group); > > We actually need to use iommu_group_get() here. Kirti adds a > vfio_group_get_from_dev() in v9 03/12 that does this properly. > >> + if (!vfio_group) >> + return NULL; >> + >> + return vfio_group_get_usrdata(vfio_group); > > This operates on a group for which we have no reference. Great to know Kirti's work! BTW, this means user need to call vfio_group_put_external_user afterwards, right? >> +} >> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata_by_device); >> + >> + >> /** >> * Sub-module support >> */ >> diff --git a/include/linux/vfio.h b/include/linux/vfio.h >> index 0ecae0b..712588f 100644 >> --- a/include/linux/vfio.h >> +++ b/include/linux/vfio.h >> @@ -91,6 +91,10 @@ extern void vfio_unregister_iommu_driver( >> extern int vfio_external_user_iommu_id(struct vfio_group *group); >> extern long vfio_external_check_extension(struct vfio_group *group, >> unsigned long arg); >> +extern void vfio_group_set_usrdata(struct vfio_group *group, void *data); >> +extern void *vfio_group_get_usrdata(struct vfio_group *group); >> +extern void *vfio_group_get_usrdata_by_device(struct device *dev); >> + >> >> /* >> * Sub-module helpers >> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c >> index 1dd087d..e00d401 100644 >> --- a/virt/kvm/vfio.c >> +++ b/virt/kvm/vfio.c >> @@ -60,6 +60,20 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group) >> symbol_put(vfio_group_put_external_user); >> } >> >> +static void kvm_vfio_group_set_kvm(struct vfio_group *group, void *kvm) >> +{ >> + void (*fn)(struct vfio_group *, void *); >> + >> + fn = symbol_get(vfio_group_set_usrdata); >> + if (!fn) >> + return; >> + >> + fn(group, kvm); >> + kvm_get_kvm(kvm); >> + >> + symbol_put(vfio_group_set_usrdata); >> +} >> + >> static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group) >> { >> long (*fn)(struct vfio_group *, unsigned long); >> @@ -161,6 +175,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg) >> >> kvm_vfio_update_coherency(dev); >> >> + kvm_vfio_group_set_kvm(vfio_group, dev->kvm); >> + >> return 0; >> >> case KVM_DEV_VFIO_GROUP_DEL: >> @@ -200,6 +216,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg) >> >> kvm_vfio_update_coherency(dev); >> >> + kvm_put_kvm(dev->kvm); >> + >> return ret; >> } > > How does anyone get'ing the usrdata know what it contains? Currently only the KVM instance. Maybe we can add other data along with flags in the future? > Does the > vendor driver compare it to a pointer it found elsewhere? How does the > vendor driver generate an error back to the user if this linkage is > necessary but unavailable? For the data == kvm scenario, yes, I think it's only valid to use it inside the kvm thread context, IIUC, comparing kvm->mm with current->mm does the trick. If not equal, in our case, the parent_ops->open() will get an -ESRCH indicating that this mdev must be used along with KVM. -- Thanks, Jike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html