On 05/04/2016 06:43 AM, Alex Williamson wrote: > On Tue, 3 May 2016 00:10:41 +0530 > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: >> + >> +/* >> + * Pin a set of guest PFNs and return their associated host PFNs for vGPU. >> + * @vaddr [in]: array of guest PFNs >> + * @npage [in]: count of array elements >> + * @prot [in] : protection flags >> + * @pfn_base[out] : array of host PFNs >> + */ >> +int vfio_pin_pages(void *iommu_data, dma_addr_t *vaddr, long npage, >> + int prot, dma_addr_t *pfn_base) >> +{ >> + struct vfio_iommu *iommu = iommu_data; >> + struct vfio_domain *domain = NULL, *domain_vgpu = NULL; >> + int i = 0, ret = 0; >> + long retpage; >> + dma_addr_t remote_vaddr = 0; >> + dma_addr_t *pfn = pfn_base; >> + struct vfio_dma *dma; >> + >> + if (!iommu || !pfn_base) >> + return -EINVAL; >> + >> + if (list_empty(&iommu->domain_list)) { >> + ret = -EINVAL; >> + goto pin_done; >> + } >> + >> + get_first_domains(iommu, &domain, &domain_vgpu); >> + >> + // Return error if vGPU domain doesn't exist > > No c++ style comments please. > >> + if (!domain_vgpu) { >> + ret = -EINVAL; >> + goto pin_done; >> + } >> + >> + for (i = 0; i < npage; i++) { >> + struct vfio_vgpu_pfn *p; >> + struct vfio_vgpu_pfn *lpfn; >> + unsigned long tpfn; >> + dma_addr_t iova; >> + >> + mutex_lock(&iommu->lock); >> + >> + iova = vaddr[i] << PAGE_SHIFT; >> + >> + dma = vfio_find_dma(iommu, iova, 0 /* size */); >> + if (!dma) { >> + mutex_unlock(&iommu->lock); >> + ret = -EINVAL; >> + goto pin_done; >> + } >> + >> + remote_vaddr = dma->vaddr + iova - dma->iova; >> + >> + retpage = vfio_pin_pages_internal(domain_vgpu, remote_vaddr, >> + (long)1, prot, &tpfn); >> + mutex_unlock(&iommu->lock); >> + if (retpage <= 0) { >> + WARN_ON(!retpage); >> + ret = (int)retpage; >> + goto pin_done; >> + } >> + >> + pfn[i] = tpfn; >> + >> + mutex_lock(&domain_vgpu->lock); >> + >> + // search if pfn exist >> + if ((p = vfio_find_vgpu_pfn(domain_vgpu, tpfn))) { >> + atomic_inc(&p->ref_count); >> + mutex_unlock(&domain_vgpu->lock); >> + continue; >> + } > > The only reason I can come up with for why we'd want to integrate an > api-only domain into the existing type1 code would be to avoid page > accounting issues where we count locked pages once for a normal > assigned device and again for a vgpu, but that's not what we're doing > here. We're not only locking the pages again regardless of them > already being locked, we're counting every time we lock them through > this new interface. So there's really no point at all to making type1 > become this unsupportable. In that case we should be pulling out the > common code that we want to share from type1 and making a new type1 > compatible vfio iommu backend rather than conditionalizing everything > here. > >> + >> + // add to pfn_list >> + lpfn = kzalloc(sizeof(*lpfn), GFP_KERNEL); >> + if (!lpfn) { >> + ret = -ENOMEM; >> + mutex_unlock(&domain_vgpu->lock); >> + goto pin_done; >> + } >> + lpfn->vmm_va = remote_vaddr; >> + lpfn->iova = iova; >> + lpfn->pfn = pfn[i]; >> + lpfn->npage = 1; >> + lpfn->prot = prot; >> + atomic_inc(&lpfn->ref_count); >> + vfio_link_vgpu_pfn(domain_vgpu, lpfn); >> + mutex_unlock(&domain_vgpu->lock); >> + } >> + >> + ret = i; >> + >> +pin_done: >> + return ret; >> +} >> +EXPORT_SYMBOL(vfio_pin_pages); >> + IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU hardware. It just, as you said in another mail, "rather than programming them into an IOMMU for a device, it simply stores the translations for use by later requests". That imposes a constraint on gfx driver: hardware IOMMU must be disabled. Otherwise, if IOMMU is present, the gfx driver eventually programs the hardware IOMMU with IOVA returned by pci_map_page or dma_map_page; Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA translations without any knowledge about hardware IOMMU, how is the device model supposed to do to get an IOVA for a given GPA (thereby HPA by the IOMMU backend here)? If things go as guessed above, as vfio_pin_pages() indicates, it pin & translate vaddr to PFN, then it will be very difficult for the device model to figure out: 1, for a given GPA, how to avoid calling dma_map_page multiple times? 2, for which page to call dma_unmap_page? -- Thanks, Jike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html