On 11/8/2016 11:16 PM, Alex Williamson wrote: > On Tue, 8 Nov 2016 21:56:29 +0530 > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > >> On 11/8/2016 5:15 AM, Alex Williamson wrote: >>> On Sat, 5 Nov 2016 02:40:45 +0530 >>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: >>> >> ... >>>> >>>> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb) >>> >>> Is the expectation here that this is a generic notifier for all >>> vfio->mdev signaling? That should probably be made clear in the mdev >>> API to avoid vendor drivers assuming their notifier callback only >>> occurs for unmaps, even if that's currently the case. >>> >> >> Ok. Adding comment about notifier callback in mdev_device which is part >> of next patch. >> >> ... >> >>>> mutex_lock(&iommu->lock); >>>> >>>> - if (!iommu->external_domain) { >>>> + /* Fail if notifier list is empty */ >>>> + if ((!iommu->external_domain) || (!iommu->notifier.head)) { >>>> ret = -EINVAL; >>>> goto pin_done; >>>> } >>>> @@ -867,6 +870,11 @@ unlock: >>>> /* Report how much was unmapped */ >>>> unmap->size = unmapped; >>>> >>>> + if (unmapped && iommu->external_domain) >>>> + blocking_notifier_call_chain(&iommu->notifier, >>>> + VFIO_IOMMU_NOTIFY_DMA_UNMAP, >>>> + unmap); >>> >>> This is after the fact, there's already a gap here where pages are >>> unpinned and the mdev device is still running. >> >> Oh, there is a bug here, now unpin_pages() take user_pfn as argument and >> find vfio_dma. If its not found, it doesn't unpin pages. We have to call >> this notifier before vfio_remove_dma(). But if we call this before >> vfio_remove_dma() there will be deadlock since iommu->lock is already >> held here and vfio_iommu_type1_unpin_pages() will also try to hold >> iommu->lock. >> If we want to call blocking_notifier_call_chain() before >> vfio_remove_dma(), sequence should be: >> >> unmapped += dma->size; >> mutex_unlock(&iommu->lock); >> if (iommu->external_domain)) { >> struct vfio_iommu_type1_dma_unmap nb_unmap; >> >> nb_unmap.iova = dma->iova; >> nb_unmap.size = dma->size; >> blocking_notifier_call_chain(&iommu->notifier, >> VFIO_IOMMU_NOTIFY_DMA_UNMAP, >> &nb_unmap); >> } >> mutex_lock(&iommu->lock); >> vfio_remove_dma(iommu, dma); > > It seems like it would be worthwhile to have the rb-tree rooted in the > vfio-dma, then we only need to call the notifier if there are pages > pinned within that vfio-dma (ie. the rb-tree is not empty). We can > then release the lock call the notifier, re-acquire the lock, and > BUG_ON if the rb-tree still is not empty. We might get duplicate pfns > between separate vfio_dma structs, but as I mentioned in other replies, > that seems like an exception that we don't need to optimize for. > If we don't optimize for the case where iova from different vfio_dma are mapped to same pfn and we would not consider this case for page accounting then: - have rb tree of pinned iova, where key would be iova, in each vfio_dma structure. - iova tracking structure would have iova and ref_count only. - page accounting would only count number of iova's in rb_tree, case where different iova could map to same pfn would not be considered in this implementation for now. - vfio_unpin_pages() would have user_pfn and pfn as input, we would validate that iova exist in rb tree and trust vendor driver that corresponding pfn is correct, there is no validation of pfn. If want validate pfn, call GUP, verify pfn and call put_pfn(). - In .release() or .detach_group() path, if there are entries in this rb tree, call GUP again using that iova, get pfn and then call put_pfn(pfn) for ref_count+1 times. This is because we are not keeping pfn in our tracking logic. Does this sound reasonable? Thanks, Kirti >>> The notifier needs to >>> happen prior to that and I suspect that we need to validate that we >>> have no remaining external pfn references within this vfio_dma block. >>> It seems like we need to root our pfn tracking in the vfio_dma so that >>> we can see that it's empty after the notifier chain and BUG_ON if not. >> >> There is no way to find pfns from that iova range with current >> implementation. We can have this validate if we go with linear array of >> iova to track pfns. > > Right, I was still hoping to avoid storing the pfn even with the > array/page-table approach though, ask the mm layer for the mapping > again. Is that too much overhead? Maybe the page table could store > the phys addr and we could use PAGE_MASK to store the reference count > so that each entry is still only 8bytes(?) > >>> I would also add some enforcement that external pinning is only enabled >>> when vfio_iommu_type1 is configured for v2 semantics (ie. we only >>> support unmaps exactly matching previous maps). >>> >> >> Ok I'll add that check. >> >> Thanks, >> Kirti > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html