Re: [PATCH] vfio/type1: unpin PageReserved page

David Hildenbrand <david@xxxxxxxxxx> · Fri, 1 Mar 2024 11:57:16 +0100

On 29.02.24 23:04, Alex Williamson wrote:
On Tue, 27 Feb 2024 13:25:56 -0700
Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:

On Tue, 27 Feb 2024 11:27:08 +0100
David Hildenbrand <david@xxxxxxxxxx> wrote:

On 26.02.24 18:32, Alex Williamson wrote:
On Tue, 27 Feb 2024 01:14:54 +0800
Yisheng Xie <ethan.xys@xxxxxxxxxxxxxxxxx> wrote:

在 2024/2/27 00:14, Alex Williamson 写道:
On Tue, 27 Feb 2024 00:01:06 +0800
Yisheng Xie<ethan.xys@xxxxxxxxxxxxxxxxx>  wrote:

We meet a warning as following:
    WARNING: CPU: 99 PID: 1766859 at mm/gup.c:209 try_grab_page.part.0+0xe8/0x1b0
    CPU: 99 PID: 1766859 Comm: qemu-kvm Kdump: loaded Tainted: GOE  5.10.134-008.2.x86_64 #1
                                                                      ^^^^^^^^

Does this issue reproduce on mainline?  Thanks,

I have check the code of mainline, the logical seems the same as my
version.

so I think it can reproduce if i understand correctly.

I obviously can't speak to what's in your 5.10.134-008.2 kernel, but I
do know there's a very similar issue resolved in v6.0 mainline and
included in v5.10.146 of the stable tree.  Please test.  Thanks,

This commit, to be precise:

commit 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4
Author: Alex Williamson <alex.williamson@xxxxxxxxxx>
Date:   Mon Aug 29 21:05:40 2022 -0600

      vfio/type1: Unpin zero pages

      There's currently a reference count leak on the zero page.  We increment
      the reference via pin_user_pages_remote(), but the page is later handled
      as an invalid/reserved page, therefore it's not accounted against the
      user and not unpinned by our put_pfn().

      Introducing special zero page handling in put_pfn() would resolve the
      leak, but without accounting of the zero page, a single user could
      still create enough mappings to generate a reference count overflow.

      The zero page is always resident, so for our purposes there's no reason
      to keep it pinned.  Therefore, add a loop to walk pages returned from
      pin_user_pages_remote() and unpin any zero pages.

BUT

in the meantime, we also have

commit c8070b78751955e59b42457b974bea4a4fe00187
Author: David Howells <dhowells@xxxxxxxxxx>
Date:   Fri May 26 22:41:40 2023 +0100

      mm: Don't pin ZERO_PAGE in pin_user_pages()

      Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer
      to it from the page tables and make unpin_user_page*() correspondingly
      ignore a ZERO_PAGE when unpinning.  We don't want to risk overrunning a
      zero page's refcount as we're only allowed ~2 million pins on it -
      something that userspace can conceivably trigger.

      Add a pair of functions to test whether a page or a folio is a ZERO_PAGE.

So the unpin_user_page_* won't do anything with the shared zeropage.

(likely, we could revert 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4)

Yes, according to the commit log it seems like the unpin is now just
wasted work since v6.5.  Thanks!

I dusted off an old unit test for mapping the zeropage through vfio and
started working on posting a revert for 873aefb376bb but I actually
found that this appears to be resolved even before c8070b787519.  I
bisected it to:

commit 84209e87c6963f928194a890399e24e8ad299db1
Author: David Hildenbrand <david@xxxxxxxxxx>
Date:   Wed Nov 16 11:26:48 2022 +0100

     mm/gup: reliable R/O long-term pinning in COW mappings

     We already support reliable R/O pinning of anonymous memory.
     However, assume we end up pinning (R/O long-term) a pagecache page
     or the shared zeropage inside a writable private ("COW") mapping.
     The next write access will trigger a write-fault and replace the
     pinned page by an exclusive anonymous page in the process page
     tables to break COW: the pinned page no longer corresponds to the
     page mapped into the process' page table.
     Now that FAULT_FLAG_UNSHARE can break COW on anything mapped into a
     COW mapping, let's properly break COW first before R/O long-term
     pinning something that's not an exclusive anon page inside a COW
     mapping. FAULT_FLAG_UNSHARE will break COW and map an exclusive
     anon page instead that can get pinned safely.

     With this change, we can stop using FOLL_FORCE|FOLL_WRITE for
     reliable R/O long-term pinning in COW mappings.

[...]

     Note 3: For users that use FOLL_LONGTERM right now without
     FOLL_WRITE, such as VFIO, we'd now no longer pin the shared
     zeropage. Instead, we'd populate exclusive anon pages that we can
     pin. There was a concern that this could affect the memlock limit
     of existing setups.

     For example, a VM running with VFIO could run into the memlock
     limit and fail to run. However, we essentially had the same
     behavior already in commit 17839856fd58 ("gup: document and work
     around "COW can break either way" issue") which got merged into
     some enterprise distros, and there were not any such complaints. So
     most probably, we're fine.

Oh, I almost forgot about that one :)

Indeed, 84209e87c696 was v6.2 and c8070b787519 was v6.5.

... and c8070b787519 was primarily concerned about !FOLL_LONGTERM usage, 
so that makes sense that they would still run into zeropages.

For vfio, 84209e87c696 did the trick.

--
Cheers,

David / dhildenb