On 09/30/2016 07:44 PM, Kirti Wankhede wrote: > On 9/30/2016 8:40 AM, Jike Song wrote: >> On 09/30/2016 10:58 AM, Jike Song wrote: >>> On 09/29/2016 11:06 PM, Kirti Wankhede wrote: >>>> >>>> >>>> On 9/29/2016 7:47 AM, Jike Song wrote: >>>>> +Guangrong >>>>> >>>>> On 08/25/2016 11:53 AM, Kirti Wankhede wrote: >>>> >>>> ... >>>> >>>>>> +static long vfio_iommu_type1_pin_pages(void *iommu_data, >>>>>> + unsigned long *user_pfn, >>>>>> + long npage, int prot, >>>>>> + unsigned long *phys_pfn) >>>>>> +{ >>>>>> + struct vfio_iommu *iommu = iommu_data; >>>>>> + struct vfio_domain *domain; >>>>>> + int i, j, ret; >>>>>> + long retpage; >>>>>> + unsigned long remote_vaddr; >>>>>> + unsigned long *pfn = phys_pfn; >>>>>> + struct vfio_dma *dma; >>>>>> + bool do_accounting = false; >>>>>> + >>>>>> + if (!iommu || !user_pfn || !phys_pfn) >>>>>> + return -EINVAL; >>>>>> + >>>>>> + mutex_lock(&iommu->lock); >>>>>> + >>>>>> + if (!iommu->local_domain) { >>>>>> + ret = -EINVAL; >>>>>> + goto pin_done; >>>>>> + } >>>>>> + >>>>>> + domain = iommu->local_domain; >>>>>> + >>>>>> + /* >>>>>> + * If iommu capable domain exist in the container then all pages are >>>>>> + * already pinned and accounted. Accouting should be done if there is no >>>>>> + * iommu capable domain in the container. >>>>>> + */ >>>>>> + do_accounting = !IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu); >>>>>> + >>>>>> + for (i = 0; i < npage; i++) { >>>>>> + struct vfio_pfn *p; >>>>>> + dma_addr_t iova; >>>>>> + >>>>>> + iova = user_pfn[i] << PAGE_SHIFT; >>>>>> + >>>>>> + dma = vfio_find_dma(iommu, iova, 0); >>>>>> + if (!dma) { >>>>>> + ret = -EINVAL; >>>>>> + goto pin_unwind; >>>>>> + } >>>>>> + >>>>>> + remote_vaddr = dma->vaddr + iova - dma->iova; >>>>>> + >>>>>> + retpage = __vfio_pin_pages_local(domain, remote_vaddr, prot, >>>>>> + &pfn[i], do_accounting); >>>>> >>>>> Hi Kirti, >>>>> >>>>> Here you call __vfio_pin_pages_local() > vaddr_get_pfn() > GUP regardless >>>>> whether the vaddr already pinned or not. That probably means, if the caller >>>>> calls vfio_pin_pages() with a GPA for multiple times, you get memory leaks. >>>>> >>>>> GUP always increases the page refcnt. >>>>> >>>>> FWIW, I would like to have the pfn_list_lock implemented with key == iova, >>>>> so you can always try to find the PFN for a given iova, and pin it only if >>>>> not found. >>>>> >>>> >>>> I didn't get how there would be a memory leak. >>>> >>>> Right, GUP increases refcnt, so if vfio_pin_pages() is called for >>>> multiple types for same GPA, refcnt would be incremented. In >>>> vfio_iommu_type1_pin_pages() pinned pages list is maintained with >>>> ref_count. If pfn is already in list, ref_count is incremented and same >>>> is used while unpining pages. >>>> >>> >>> Let's have a close look at vfio_unpin_pfn: >>> >>> static int vfio_unpin_pfn(struct vfio_domain *domain, >>> struct vfio_pfn *vpfn, bool do_accounting) >>> { >>> __vfio_unpin_pages_for_mdev(domain, vpfn->pfn, vpfn->prot, >>> do_accounting); >>> >>> if (atomic_dec_and_test(&vpfn->ref_count)) >>> vfio_remove_from_pfn_list(domain, vpfn); >>> >>> return 1; >>> } >>> >>> Here you didn't call __vfio_unpin_pages_for_mdev -- thereby put_page -- for >>> vpfn->ref_count times. If page->_refcount increased by GUP for (N) times, here >>> you only set it back to (N-1). >>> > > User of vfio_pin_pages() should call vfio_unpin_pages() also, so here > we unpin it once. If vfio_pin_pages() is called twice for same page, we > should get vfio_unpin_pages() twice for same page. > If this is the deliberate design, why do you need a 'ref_count'? You can simply drop the 'ref_count' and blame the caller for pinning/unpinning different times. > If users of these APIs don't follow this, then > vfio_release_domain() -> vfio_local_unpin_all() takes care of unpin, > decrement ref_count and delete node on (ref_count == 0) for all > remaining pfn. > Here you did pay attention to the "caller doesn't follow this" situation. However, dealing with 'ref_count' in vfio-iommu is not enough: memory leaked. >> >> What's more, since all pinned {iova, pfni} already saved, it's better to >> consult it before calling GUP, which will get_page() unconditionally. > > pfn is required to unpin page, so we have pfn as key for rbtree. > vfio_pin_pages() is called with user_pfn or iova, which can't be used to > search in rbtree with iova in optimized way. Raw way would be to goto > each node of rbtree and check iova which would hamper the performance in > if this is called in performance critical path. > So here optimized way is to first pin it, get pfn and check if already > exist in rbtree. If it exist increment ref_count else add it to the rbtree. > Of course pfn is required to unpin page, I 100% agree. But that doesn't change the argue: using iova instead for key, you can still store pfn along with it. By the way, calling GUP unconditionally hurts more than searching rbtree. -- Thanks, Jike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html