On Tue, 30 Aug 2022 09:59:33 +0200 David Hildenbrand <david@xxxxxxxxxx> wrote: > On 30.08.22 05:05, Alex Williamson wrote: > > There's currently a reference count leak on the zero page. We increment > > the reference via pin_user_pages_remote(), but the page is later handled > > as an invalid/reserved page, therefore it's not accounted against the > > user and not unpinned by our put_pfn(). > > > > Introducing special zero page handling in put_pfn() would resolve the > > leak, but without accounting of the zero page, a single user could > > still create enough mappings to generate a reference count overflow. > > > > The zero page is always resident, so for our purposes there's no reason > > to keep it pinned. Therefore, add a loop to walk pages returned from > > pin_user_pages_remote() and unpin any zero pages. > > > > Cc: David Hildenbrand <david@xxxxxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx > > Reported-by: Luboslav Pivarc <lpivarc@xxxxxxxxxx> > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > > --- > > drivers/vfio/vfio_iommu_type1.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > > index db516c90a977..8706482665d1 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -558,6 +558,18 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, > > ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM, > > pages, NULL, NULL); > > if (ret > 0) { > > + int i; > > + > > + /* > > + * The zero page is always resident, we don't need to pin it > > + * and it falls into our invalid/reserved test so we don't > > + * unpin in put_pfn(). Unpin all zero pages in the batch here. > > + */ > > + for (i = 0 ; i < ret; i++) { > > + if (unlikely(is_zero_pfn(page_to_pfn(pages[i])))) > > + unpin_user_page(pages[i]); > > + } > > + > > *pfn = page_to_pfn(pages[0]); > > goto done; > > } > > > > > > As discussed offline, for the shared zeropage (that's not even > refcounted when mapped into a process), this makes perfect sense to me. > > Good question raised by Sean if ZONE_DEVICE pages might similarly be > problematic. But for them, we cannot simply always unpin here. What sort of VM mapping would give me ZONE_DEVICE pages? Thanks, Alex