On Thu, Dec 22, 2022 at 06:15:24PM +0000, Sean Christopherson wrote: > On Wed, Dec 21, 2022, Chao Peng wrote: > > On Tue, Dec 20, 2022 at 08:33:05AM +0000, Huang, Kai wrote: > > > On Tue, 2022-12-20 at 15:22 +0800, Chao Peng wrote: > > > > On Mon, Dec 19, 2022 at 08:48:10AM +0000, Huang, Kai wrote: > > > > > On Mon, 2022-12-19 at 15:53 +0800, Chao Peng wrote: > > > But for non-restricted-mem case, it is correct for KVM to decrease page's > > > refcount after setting up mapping in the secondary mmu, otherwise the page will > > > be pinned by KVM for normal VM (since KVM uses GUP to get the page). > > > > That's true. Actually even true for restrictedmem case, most likely we > > will still need the kvm_release_pfn_clean() for KVM generic code. On one > > side, other restrictedmem users like pKVM may not require page pinning > > at all. On the other side, see below. > > > > > > > > So what we are expecting is: for KVM if the page comes from restricted mem, then > > > KVM cannot decrease the refcount, otherwise for normal page via GUP KVM should. > > No, requiring the user (KVM) to guard against lack of support for page migration > in restricted mem is a terrible API. It's totally fine for restricted mem to not > support page migration until there's a use case, but punting the problem to KVM > is not acceptable. Restricted mem itself doesn't yet support page migration, > e.g. explosions would occur even if KVM wanted to allow migration since there is > no notification to invalidate existing mappings. > > > I argue that this page pinning (or page migration prevention) is not > > tied to where the page comes from, instead related to how the page will > > be used. Whether the page is restrictedmem backed or GUP() backed, once > > it's used by current version of TDX then the page pinning is needed. So > > such page migration prevention is really TDX thing, even not KVM generic > > thing (that's why I think we don't need change the existing logic of > > kvm_release_pfn_clean()). Wouldn't better to let TDX code (or who > > requires that) to increase/decrease the refcount when it populates/drops > > the secure EPT entries? This is exactly what the current TDX code does: > > I agree that whether or not migration is supported should be controllable by the > user, but I strongly disagree on punting refcount management to KVM (or TDX). > The whole point of restricted mem is to support technologies like TDX and SNP, > accomodating their special needs for things like page migration should be part of > the API, not some footnote in the documenation. I never doubt page migration should be part of restrictedmem API, but that's not an initial implementing as we all agreed? Then before that API being introduced, we need find a solution to prevent page migration for TDX. Other than refcount management, do we have any other workable solution? > > It's not difficult to let the user communicate support for page migration, e.g. > if/when restricted mem gains support, add a hook to restrictedmem_notifier_ops > to signal support (or lack thereof) for page migration. NULL == no migration, > non-NULL == migration allowed. I know. > > We know that supporting page migration in TDX and SNP is possible, and we know > that page migration will require a dedicated API since the backing store can't > memcpy() the page. I don't see any reason to ignore that eventuality. No, I'm not ignoring it. It's just about the short-term page migration prevention before that dedicated API being introduced. > > But again, unless I'm missing something, that's a future problem because restricted > mem doesn't yet support page migration regardless of the downstream user. It's true a future problem for page migration support itself, but page migration prevention is not a future problem since TDX pages need to be pinned before page migration gets supported. Thanks, Chao