On 1/24/23 00:38, Sean Christopherson wrote: > On Mon, Jan 23, 2023, Huang, Kai wrote: >> On Mon, 2023-01-23 at 15:03 +0100, Vlastimil Babka wrote: >>> On 12/22/22 01:37, Huang, Kai wrote: >>>>>> I argue that this page pinning (or page migration prevention) is not >>>>>> tied to where the page comes from, instead related to how the page will >>>>>> be used. Whether the page is restrictedmem backed or GUP() backed, once >>>>>> it's used by current version of TDX then the page pinning is needed. So >>>>>> such page migration prevention is really TDX thing, even not KVM generic >>>>>> thing (that's why I think we don't need change the existing logic of >>>>>> kvm_release_pfn_clean()). >>>>>> >>>> This essentially boils down to who "owns" page migration handling, and sadly, >>>> page migration is kinda "owned" by the core-kernel, i.e. KVM cannot handle page >>>> migration by itself -- it's just a passive receiver. >>>> >>>> For normal pages, page migration is totally done by the core-kernel (i.e. it >>>> unmaps page from VMA, allocates a new page, and uses migrate_pape() or a_ops- >>>>> migrate_page() to actually migrate the page). >>>> In the sense of TDX, conceptually it should be done in the same way. The more >>>> important thing is: yes KVM can use get_page() to prevent page migration, but >>>> when KVM wants to support it, KVM cannot just remove get_page(), as the core- >>>> kernel will still just do migrate_page() which won't work for TDX (given >>>> restricted_memfd doesn't have a_ops->migrate_page() implemented). >>>> >>>> So I think the restricted_memfd filesystem should own page migration handling, >>>> (i.e. by implementing a_ops->migrate_page() to either just reject page migration >>>> or somehow support it). >>> >>> While this thread seems to be settled on refcounts already, >>> >> >> I am not sure but will let Sean/Paolo to decide. > > My preference is whatever is most performant without being hideous :-) > >>> just wanted >>> to point out that it wouldn't be ideal to prevent migrations by >>> a_ops->migrate_page() rejecting them. It would mean cputime wasted (i.e. >>> by memory compaction) by isolating the pages for migration and then >>> releasing them after the callback rejects it (at least we wouldn't waste >>> time creating and undoing migration entries in the userspace page tables >>> as there's no mmap). Elevated refcount on the other hand is detected >>> very early in compaction so no isolation is attempted, so from that >>> aspect it's optimal. >> >> I am probably missing something, > > Heh, me too, I could have sworn that using refcounts was the least efficient way > to block migration. Well I admit that due to my experience with it, I do mostly consider migration through memory compaction POV, which is a significant user of migration on random pages that's not requested by userspace actions on specific ranges. And compaction has in isolate_migratepages_block(): /* * Migration will fail if an anonymous page is pinned in memory, * so avoid taking lru_lock and isolating it unnecessarily in an * admittedly racy check. */ mapping = page_mapping(page); if (!mapping && page_count(page) > page_mapcount(page)) goto isolate_fail; so that prevents migration of pages with elevated refcount very early, before they are even isolated, so before migrate_pages() is called. But it's true there are other sources of "random pages migration" - numa balancing, demotion in lieu of reclaim... and I'm not sure if all have such early check too. Anyway, whatever is decided to be a better way than elevated refcounts, would ideally be checked before isolation as well, as that's the most efficient way. >> but IIUC the checking of refcount happens at very last stage of page migration too