On Fri, 6 Aug 2021 19:02:39 +0000 Sean Christopherson wrote: > On Fri, Aug 06, 2021, Kai Huang wrote: > > On Thu, 5 Aug 2021 16:06:41 +0000 Sean Christopherson wrote: > > > On Thu, Aug 05, 2021, Kai Huang wrote: > > > > On Fri, 2 Jul 2021 15:04:47 -0700 isaku.yamahata@xxxxxxxxx wrote: > > > > > From: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > > > > > @@ -2020,6 +2032,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > > > > > sp = kvm_mmu_alloc_page(vcpu, direct); > > > > > > > > > > sp->gfn = gfn; > > > > > + sp->gfn_stolen_bits = gfn_stolen_bits; > > > > > sp->role = role; > > > > > hlist_add_head(&sp->hash_link, sp_list); > > > > > if (!direct) { > > > > > @@ -2044,6 +2057,13 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > > > > > return sp; > > > > > } > > > > > > > > > > > > Sorry for replying old thread, > > > > > > Ha, one month isn't old, it's barely even mature. > > > > > > > but to me it looks weird to have gfn_stolen_bits > > > > in 'struct kvm_mmu_page'. If I understand correctly, above code basically > > > > means that GFN with different stolen bit will have different 'struct > > > > kvm_mmu_page', but in the context of this patch, mappings with different > > > > stolen bits still use the same root, > > > > > > You're conflating "mapping" with "PTE". The GFN is a per-PTE value. Yes, there > > > is a final GFN that is representative of the mapping, but more directly the final > > > GFN is associated with the leaf PTE. > > > > > > TDX effectively adds the restriction that all PTEs used for a mapping must have > > > the same shared/private status, so mapping and PTE are somewhat interchangeable > > > when talking about stolen bits (the shared bit), but in the context of this patch, > > > the stolen bits are a property of the PTE. > > > > Yes it is a property of PTE, this is the reason that I think it's weird to have > > stolen bits in 'struct kvm_mmu_page'. Shouldn't stolen bits in 'struct > > kvm_mmu_page' imply that all PTEs (whether leaf or not) share the same > > stolen bit? > > No, the stolen bits are the property of the shadow page. I'm using "PTE" above > to mean "PTE for this shadow page", not PTEs within the shadow page, if that makes > sense. I see. > > > > Back to your statement, it's incorrect. PTEs (effectively mappings in TDX) with > > > different stolen bits will _not_ use the same root. kvm_mmu_get_page() includes > > > the stolen bits in both the hash lookup and in the comparison, i.e. restores the > > > stolen bits when looking for an existing shadow page at the target GFN. > > > > > > @@ -1978,9 +1990,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > > > role.quadrant = quadrant; > > > } > > > > > > - sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]; > > > + sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn_and_stolen)]; > > > for_each_valid_sp(vcpu->kvm, sp, sp_list) { > > > - if (sp->gfn != gfn) { > > > + if ((sp->gfn | sp->gfn_stolen_bits) != gfn_and_stolen) { > > > collisions++; > > > continue; > > > } > > > > > > > This only works for non-root table, but there's only one single > > vcpu->arch.mmu->root_hpa, we don't have an array to have one root for each > > stolen bit, i.e. do a loop in mmu_alloc_direct_roots(), so effectively all > > stolen bits share one single root. > > Yes, and that's absolutely the required behavior for everything except for TDX > with its two EPTPs. E.g. any other implement _must_ reject CR3s that set stolen > gfn bits. OK. I was thinking gfn_stolen_bits for 'struct kvm_mmu_page' for the table pointed by CR3 should still make sense. > > > > > which means gfn_stolen_bits doesn't make a lot of sense at least for root > > > > page table. > > > > > > It does make sense, even without a follow-up patch. In Rick's original series, > > > stealing a bit for execute-only guest memory, there was only a single root. And > > > except for TDX, there can only ever be a single root because the shared EPTP isn't > > > usable, i.e. there's only the regular/private EPTP. > > > > > > > Instead, having gfn_stolen_bits in 'struct kvm_mmu_page' only makes sense in > > > > the context of TDX, since TDX requires two separate roots for private and > > > > shared mappings. > > > > > > > So given we cannot tell whether the same root, or different roots should be > > > > used for different stolen bits, I think we should not add 'gfn_stolen_bits' to > > > > 'struct kvm_mmu_page' and use it to determine whether to allocate a new table > > > > for the same GFN, but should use a new role (i.e role.private) to determine. > > > > > > A new role would work, too, but it has the disadvantage of not automagically > > > working for all uses of stolen bits, e.g. XO support would have to add another > > > role bit. > > > > For each purpose of particular stolen bit, a new role can be defined. For > > instance, in __direct_map(), if you see stolen bit is TDX shared bit, you don't > > set role.private (otherwise set role.private). For XO, if you see the stolen > > bit is XO, you set role.xo. > > > > We already have info of 'gfn_stolen_mask' in vcpu, so we just need to make sure > > all code paths can find the actual stolen bit based on sp->role and vcpu (I > > haven't gone through all though, assuming the annoying part is rmap). > > Yes, and I'm not totally against the idea, but I'm also not 100% sold on it either, > yet... The idea of a 'private' flag is growing on me. > > If we're treating the shared bit as an attribute bit, which we are, then it's > effectively an extension of role.access. Ditto for XO. > > And looking at the code, I think it would be an improvement for TDX, as all of > the is_private_gfn() calls that operate on a shadow page would be simplified and > optimized as they wouldn't have to lookup both gfn_stolen_bits and the vcpu->kvm > mask of the shared bit. > > Actually, the more I think about it, the more I like it. For TDX, there's no > risk of increased hash collisions, as we've already done messed up if there's a > shared vs. private collision. > > And for XO, if it ever makes it way upstream, I think we should flat out disallow > referencing XO addresses in non-leaf PTEs, i.e. make the XO permission bit reserved > in non-leaf PTEs. That would avoid any theoretical problems with the guest doing > something stupid by polluting all its upper level PxEs with XO. Collisions would > be purely limited to the case where the guest is intentionally creating an alternate > mapping, which should be a rare event (or the guest is comprosied, which is also > hopefully a rare event). > > My main motivation is 'gfn_stolen_bits' doesn't quite make sense for 'struct kvm_mmu_page' for root, plus it seems it's a little bit redundant at first glance. So could we have your final suggestion? :) Thanks, -Kai