On 9/30/2016 8:40 AM, Jike Song wrote: > On 09/30/2016 10:58 AM, Jike Song wrote: >> On 09/29/2016 11:06 PM, Kirti Wankhede wrote: >>> >>> >>> On 9/29/2016 7:47 AM, Jike Song wrote: >>>> +Guangrong >>>> >>>> On 08/25/2016 11:53 AM, Kirti Wankhede wrote: >>> >>> ... >>> >>>>> +static long vfio_iommu_type1_pin_pages(void *iommu_data, >>>>> + unsigned long *user_pfn, >>>>> + long npage, int prot, >>>>> + unsigned long *phys_pfn) >>>>> +{ >>>>> + struct vfio_iommu *iommu = iommu_data; >>>>> + struct vfio_domain *domain; >>>>> + int i, j, ret; >>>>> + long retpage; >>>>> + unsigned long remote_vaddr; >>>>> + unsigned long *pfn = phys_pfn; >>>>> + struct vfio_dma *dma; >>>>> + bool do_accounting = false; >>>>> + >>>>> + if (!iommu || !user_pfn || !phys_pfn) >>>>> + return -EINVAL; >>>>> + >>>>> + mutex_lock(&iommu->lock); >>>>> + >>>>> + if (!iommu->local_domain) { >>>>> + ret = -EINVAL; >>>>> + goto pin_done; >>>>> + } >>>>> + >>>>> + domain = iommu->local_domain; >>>>> + >>>>> + /* >>>>> + * If iommu capable domain exist in the container then all pages are >>>>> + * already pinned and accounted. Accouting should be done if there is no >>>>> + * iommu capable domain in the container. >>>>> + */ >>>>> + do_accounting = !IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu); >>>>> + >>>>> + for (i = 0; i < npage; i++) { >>>>> + struct vfio_pfn *p; >>>>> + dma_addr_t iova; >>>>> + >>>>> + iova = user_pfn[i] << PAGE_SHIFT; >>>>> + >>>>> + dma = vfio_find_dma(iommu, iova, 0); >>>>> + if (!dma) { >>>>> + ret = -EINVAL; >>>>> + goto pin_unwind; >>>>> + } >>>>> + >>>>> + remote_vaddr = dma->vaddr + iova - dma->iova; >>>>> + >>>>> + retpage = __vfio_pin_pages_local(domain, remote_vaddr, prot, >>>>> + &pfn[i], do_accounting); >>>> >>>> Hi Kirti, >>>> >>>> Here you call __vfio_pin_pages_local() > vaddr_get_pfn() > GUP regardless >>>> whether the vaddr already pinned or not. That probably means, if the caller >>>> calls vfio_pin_pages() with a GPA for multiple times, you get memory leaks. >>>> >>>> GUP always increases the page refcnt. >>>> >>>> FWIW, I would like to have the pfn_list_lock implemented with key == iova, >>>> so you can always try to find the PFN for a given iova, and pin it only if >>>> not found. >>>> >>> >>> I didn't get how there would be a memory leak. >>> >>> Right, GUP increases refcnt, so if vfio_pin_pages() is called for >>> multiple types for same GPA, refcnt would be incremented. In >>> vfio_iommu_type1_pin_pages() pinned pages list is maintained with >>> ref_count. If pfn is already in list, ref_count is incremented and same >>> is used while unpining pages. >>> >> >> Let's have a close look at vfio_unpin_pfn: >> >> static int vfio_unpin_pfn(struct vfio_domain *domain, >> struct vfio_pfn *vpfn, bool do_accounting) >> { >> __vfio_unpin_pages_for_mdev(domain, vpfn->pfn, vpfn->prot, >> do_accounting); >> >> if (atomic_dec_and_test(&vpfn->ref_count)) >> vfio_remove_from_pfn_list(domain, vpfn); >> >> return 1; >> } >> >> Here you didn't call __vfio_unpin_pages_for_mdev -- thereby put_page -- for >> vpfn->ref_count times. If page->_refcount increased by GUP for (N) times, here >> you only set it back to (N-1). >> User of vfio_pin_pages() should call vfio_unpin_pages() also, so here we unpin it once. If vfio_pin_pages() is called twice for same page, we should get vfio_unpin_pages() twice for same page. If users of these APIs don't follow this, then vfio_release_domain() -> vfio_local_unpin_all() takes care of unpin, decrement ref_count and delete node on (ref_count == 0) for all remaining pfn. > > What's more, since all pinned {iova, pfni} already saved, it's better to > consult it before calling GUP, which will get_page() unconditionally. pfn is required to unpin page, so we have pfn as key for rbtree. vfio_pin_pages() is called with user_pfn or iova, which can't be used to search in rbtree with iova in optimized way. Raw way would be to goto each node of rbtree and check iova which would hamper the performance in if this is called in performance critical path. So here optimized way is to first pin it, get pfn and check if already exist in rbtree. If it exist increment ref_count else add it to the rbtree. Thanks, Kirti ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html