Re: [PATCH] mm/gup: Use try_grab_page() instead of try_grab_folio() in gup slow path

Peter Xu <peterx@xxxxxxxxxx> · Thu, 27 Jun 2024 14:55:27 -0400

On Thu, Jun 27, 2024 at 04:53:08PM +0800, yangge1116@xxxxxxx wrote:
> From: yangge <yangge1116@xxxxxxx>
> 
> If a large number of CMA memory are configured in system (for
> example, the CMA memory accounts for 50% of the system memory),
> starting a SEV virtual machine will fail. During starting the SEV
> virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM,
> ...) to pin memory. Normally if a page is present and in CMA area,
> pin_user_pages_fast() will first call __get_user_pages_locked() to
> pin the page in CMA area, and then call
> check_and_migrate_movable_pages() to migrate the page from CMA area
> to non-CMA area. But the current code calling __get_user_pages_locked()
> will fail, because it call try_grab_folio() to pin page in gup slow
> path.
> 
> The commit 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages
> != NULL"") uses try_grab_folio() in gup slow path, which seems to be
> problematic because try_grap_folio() will check if the page can be
> longterm pinned. This check may fail and cause __get_user_pages_lock()
> to fail. However, these checks are not required in gup slow path,
> seems we can use try_grab_page() instead of try_grab_folio(). In
> addition, in the current code, try_grab_page() can only add 1 to the
> page's refcount. We extend this function so that the page's refcount
> can be increased according to the parameters passed in.
> 
> The following log reveals it:
> 
> [  464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520
> [  464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6
> [  464.325477] RIP: 0010:__get_user_pages+0x423/0x520
> [  464.325515] Call Trace:
> [  464.325520]  <TASK>
> [  464.325523]  ? __get_user_pages+0x423/0x520
> [  464.325528]  ? __warn+0x81/0x130
> [  464.325536]  ? __get_user_pages+0x423/0x520
> [  464.325541]  ? report_bug+0x171/0x1a0
> [  464.325549]  ? handle_bug+0x3c/0x70
> [  464.325554]  ? exc_invalid_op+0x17/0x70
> [  464.325558]  ? asm_exc_invalid_op+0x1a/0x20
> [  464.325567]  ? __get_user_pages+0x423/0x520
> [  464.325575]  __gup_longterm_locked+0x212/0x7a0
> [  464.325583]  internal_get_user_pages_fast+0xfb/0x190
> [  464.325590]  pin_user_pages_fast+0x47/0x60
> [  464.325598]  sev_pin_memory+0xca/0x170 [kvm_amd]
> [  464.325616]  sev_mem_enc_register_region+0x81/0x130 [kvm_amd]
> 
> Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"")
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: yangge <yangge1116@xxxxxxx>

Thanks for the report and the fix proposed.  This is unfortunate..

It's just that I worry this may not be enough, as thp slow gup isn't the
only one using try_grab_folio().  There're also hugepd and memfd pinning
(which just got queued, again).

I suspect both of them can also hit a cma chunk here, and fail whenever
they shouldn't have.

The slight complexity resides in the hugepd path where it right now shares
with fast-gup.  So we may potentially need something similiar to what Yang
used to introduce in this patch:

https://lore.kernel.org/r/20240604234858.948986-2-yang@xxxxxxxxxxxxxxxxxxxxxx

So as to identify whether the hugepd gup is slow or fast, and we should
only let the fast gup fail on those.

Let me also loop them in on the other relevant discussion.

Thanks,

-- 
Peter Xu