Yang, On Thu, Jun 27, 2024 at 03:14:13PM -0700, Yang Shi wrote: > The try_grab_folio() is supposed to be used in fast path and it elevates > folio refcount by using add ref unless zero. We are guaranteed to have > at least one stable reference in slow path, so the simple atomic add > could be used. The performance difference should be trivial, but the > misuse may be confusing and misleading. This first paragraph is IMHO misleading itself.. I think we should mention upfront the important bit, on the user impact. Here IMO the user impact should be: Linux may fail longterm pin in some releavnt paths when applied over CMA reserved blocks. And if to extend a bit, that include not only slow-gup but also the new memfd pinning, because both of them used try_grab_folio() which used to be only for fast-gup. It's great this patch renamed try_grab_folio() to try_grab_folio_fast(), I think that definitely helps on reducing the abuse in the future. However then with that the subject becomes misleading, because it says "do not call try_grab_folio()" however after this patch we keep using it. Maybe rename the subject to "mm: Fix longterm pin on slow gup and memfd pin regress"? > > In another thread [1] a kernel warning was reported when pinning folio > in CMA memory when launching SEV virtual machine. The splat looks like: > > [ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 > [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 > [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 > [ 464.325515] Call Trace: > [ 464.325520] <TASK> > [ 464.325523] ? __get_user_pages+0x423/0x520 > [ 464.325528] ? __warn+0x81/0x130 > [ 464.325536] ? __get_user_pages+0x423/0x520 > [ 464.325541] ? report_bug+0x171/0x1a0 > [ 464.325549] ? handle_bug+0x3c/0x70 > [ 464.325554] ? exc_invalid_op+0x17/0x70 > [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 > [ 464.325567] ? __get_user_pages+0x423/0x520 > [ 464.325575] __gup_longterm_locked+0x212/0x7a0 > [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 > [ 464.325590] pin_user_pages_fast+0x47/0x60 > [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] > [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd] > > Per the analysis done by yangge, when starting the SEV virtual machine, > it will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the > memory. But the page is in CMA area, so fast GUP will fail then > fallback to the slow path due to the longterm pinnalbe check in > try_grab_folio(). > The slow path will try to pin the pages then migrate them out of CMA > area. But the slow path also uses try_grab_folio() to pin the page, > it will also fail due to the same check then the above warning > is triggered. > > [1] https://lore.kernel.org/linux-mm/1719478388-31917-1-git-send-email-yangge1116@xxxxxxx/ > > Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") > Cc: <stable@xxxxxxxxxxxxxxx> [6.6+] > Reported-by: yangge <yangge1116@xxxxxxx> > Signed-off-by: Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx> The patch itself looks mostly ok to me. There's still some "cleanup" part mangled together, e.g., the real meat should be avoiding the folio_is_longterm_pinnable() check in relevant paths. The rest (e.g. switch slow-gup / memfd pin to use folio_ref_add() not try_get_folio(), and renames) could be good cleanups. So a smaller fix might be doable, but again I don't have a strong opinion here. Thanks, -- Peter Xu