On Thu, Jun 27, 2024 at 11:55 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Thu, Jun 27, 2024 at 04:53:08PM +0800, yangge1116@xxxxxxx wrote: > > From: yangge <yangge1116@xxxxxxx> > > > > If a large number of CMA memory are configured in system (for > > example, the CMA memory accounts for 50% of the system memory), > > starting a SEV virtual machine will fail. During starting the SEV > > virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, > > ...) to pin memory. Normally if a page is present and in CMA area, > > pin_user_pages_fast() will first call __get_user_pages_locked() to > > pin the page in CMA area, and then call > > check_and_migrate_movable_pages() to migrate the page from CMA area > > to non-CMA area. But the current code calling __get_user_pages_locked() > > will fail, because it call try_grab_folio() to pin page in gup slow > > path. > > > > The commit 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages > > != NULL"") uses try_grab_folio() in gup slow path, which seems to be > > problematic because try_grap_folio() will check if the page can be > > longterm pinned. This check may fail and cause __get_user_pages_lock() > > to fail. However, these checks are not required in gup slow path, > > seems we can use try_grab_page() instead of try_grab_folio(). In > > addition, in the current code, try_grab_page() can only add 1 to the > > page's refcount. We extend this function so that the page's refcount > > can be increased according to the parameters passed in. > > > > The following log reveals it: > > > > [ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 > > [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 > > [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 > > [ 464.325515] Call Trace: > > [ 464.325520] <TASK> > > [ 464.325523] ? __get_user_pages+0x423/0x520 > > [ 464.325528] ? __warn+0x81/0x130 > > [ 464.325536] ? __get_user_pages+0x423/0x520 > > [ 464.325541] ? report_bug+0x171/0x1a0 > > [ 464.325549] ? handle_bug+0x3c/0x70 > > [ 464.325554] ? exc_invalid_op+0x17/0x70 > > [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 > > [ 464.325567] ? __get_user_pages+0x423/0x520 > > [ 464.325575] __gup_longterm_locked+0x212/0x7a0 > > [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 > > [ 464.325590] pin_user_pages_fast+0x47/0x60 > > [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] > > [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd] > > > > Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") > > Cc: <stable@xxxxxxxxxxxxxxx> > > Signed-off-by: yangge <yangge1116@xxxxxxx> > > Thanks for the report and the fix proposed. This is unfortunate.. > > It's just that I worry this may not be enough, as thp slow gup isn't the > only one using try_grab_folio(). There're also hugepd and memfd pinning > (which just got queued, again). > > I suspect both of them can also hit a cma chunk here, and fail whenever > they shouldn't have. > > The slight complexity resides in the hugepd path where it right now shares > with fast-gup. So we may potentially need something similiar to what Yang > used to introduce in this patch: > > https://lore.kernel.org/r/20240604234858.948986-2-yang@xxxxxxxxxxxxxxxxxxxxxx > > So as to identify whether the hugepd gup is slow or fast, and we should > only let the fast gup fail on those. > > Let me also loop them in on the other relevant discussion. Thanks, Peter. I was actually typing the same thing... Yes, I agree my patch should be able to solve the problem. At the beginning I thought it is just a pure clean up patch, but it seems like it is more useful. I'm going to port my patch to the latest mm-unstable, then post it. > > Thanks, > > -- > Peter Xu >