On Thu, Jun 22, 2023 at 09:55:22AM +0000, Huang, Kai wrote: > > > > > So if we were to straight-forwardly implement that based on how TDX > > currently handles checking for the shared bit in GPA, paired with how > > SEV-SNP handles checking for private bit in fault flags, it would look > > something like: > > > > bool kvm_fault_is_private(kvm, gpa, err) > > { > > /* SEV-SNP handling */ > > if (kvm->arch.mmu_private_fault_mask) > > return !!(err & arch.mmu_private_fault_mask); > > > > /* TDX handling */ > > if (kvm->arch.gfn_shared_mask) > > return !!(gpa & arch.gfn_shared_mask); > > The logic of the two are identical. I think they need to be converged. I think they're just different enough that trying too hard to converge them might obfuscate things. If the determination didn't come from 2 completely different fields (gpa vs. fault flags) maybe it could be simplified a bit more, but have well-defined open-coded handler that gets called once to set fault->is_private during initial fault time so that that ugliness never needs to be looked at again by KVM MMU seems like a good way to keep things simple through the rest of the handling. > > Either SEV-SNP should convert the error code private bit to the gfn_shared_mask, > or TDX's shared bit should be converted to some private error bit. struct kvm_page_fault seems to be the preferred way to pass additional params/metadata around, and .is_private field was introduced to track this private/shared state as part of UPM base series: https://lore.kernel.org/lkml/20221202061347.1070246-9-chao.p.peng@xxxxxxxxxxxxxxx/ So it seems like unecessary complexity to track/encode that state into other additional places rather than just encapsulating it all in fault->is_private (or some similar field), and synthesizing all this platform-specific handling into a well-defined value that's conveyed by something like fault->is_private in a way where KVM MMU doesn't need to worry as much about platform-specific stuff seems like a good thing, and in line with what the UPM base series was trying to do by adding the fault->is_private field. So all I'm really proposing is that whatever SNP and TDX end up doing should center around setting that fault->is_private field and keeping everything contained there. If there are better ways to handle *how* that's done I don't have any complaints there, but moving/adding bits to GPA/error_flags after fault time just seems unecessary to me when fault->is_private field can serve that purpose just as well. > > Perhaps converting SEV-SNP makes more sense because if I recall correctly SEV > guest also has a C-bit, correct? That's correct, but the C-bit doesn't show in the GPA that gets passed up to KVM during an #NPF, and instead gets encoded into the fault's error_flags. > > Or, ... > > > > > return false; > > } > > > > kvm_mmu_do_page_fault(vcpu, gpa, err, ...) > > { > > struct kvm_page_fault fault = { > > ... > > .is_private = kvm_fault_is_private(vcpu->kvm, gpa, err) > > ... should we do something like: > > .is_private = static_call(kvm_x86_fault_is_private)(vcpu->kvm, gpa, > err); We actually had exactly this in v7 of SNP hypervisor patches: https://lore.kernel.org/linux-coco/20221214194056.161492-7-michael.roth@xxxxxxx/T/#m17841f5bfdfb8350d69d78c6831dd8f3a4748638 but Sean was hoping to avoid a callback, which is why we ended up using a bitmask in this version since it basically all that callback would need to do. It's unfortunately that we don't have a common scheme between SNP/TDX, but maybe that's still possible, I just think that whatever that ends up being, it should live and be contained inside whatever helper ends up setting fault->is_private. There's some other awkwardness with a callback approach. It sort of ties into your question about selftests so I'll address it below... > > ? > > > }; > > > > ... > > } > > > > And then arch.mmu_private_fault_mask and arch.gfn_shared_mask would be > > set per-KVM-instance, just like they are now with current SNP and TDX > > patchsets, since stuff like KVM self-test wouldn't be setting those > > masks, so it makes sense to do it per-instance in that regard. > > > > But that still gets a little awkward for the KVM self-test use-case where > > .is_private should sort of be ignored in favor of whatever the xarray > > reports via kvm_mem_is_private(). > > > > I must have missed something. Why does KVM self-test have impact to how does > KVM handles private fault? The self-tests I'm referring to here are the ones from Vishal that shipped with v10 of Chao's UPM/fd-based private memory series, and also as part of Sean's gmem tree: https://github.com/sean-jc/linux/commit/a0f5f8c911804f55935094ad3a277301704330a6 These exercise gmem/UPM handling without the need for any SNP/TDX hardware support. They do so by "trusting" the shared/private state that the VMM sets via KVM_SET_MEMORY_ATTRIBUTES. So if VMM says it should be private, KVM MMU will treat it as private, so we'd never get a mismatch, so KVM_EXIT_MEMORY_FAULT will never be generated. > > > In your Misc. series I believe you > > handled this by introducing a PFERR_HASATTR_MASK bit so we can determine > > whether existing value of fault->is_private should be > > ignored/overwritten or not. > > > > So maybe kvm_fault_is_private() needs to return an integer value > > instead, like: > > > > enum { > > KVM_FAULT_VMM_DEFINED, > > KVM_FAULT_SHARED, > > KVM_FAULT_PRIVATE, > > } > > > > bool kvm_fault_is_private(kvm, gpa, err) > > { > > /* SEV-SNP handling */ > > if (kvm->arch.mmu_private_fault_mask) > > (err & arch.mmu_private_fault_mask) ? KVM_FAULT_PRIVATE : KVM_FAULT_SHARED > > > > /* TDX handling */ > > if (kvm->arch.gfn_shared_mask) > > (gpa & arch.gfn_shared_mask) ? KVM_FAULT_SHARED : KVM_FAULT_PRIVATE > > > > return KVM_FAULT_VMM_DEFINED; > > } > > > > And then down in __kvm_faultin_pfn() we do: > > > > if (fault->is_private == KVM_FAULT_VMM_DEFINED) > > fault->is_private = kvm_mem_is_private(vcpu->kvm, fault->gfn); > > else if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) > > return kvm_do_memory_fault_exit(vcpu, fault); > > > > if (fault->is_private) > > return kvm_faultin_pfn_private(vcpu, fault); > > > What does KVM_FAULT_VMM_DEFINED mean, exactly? > > Shouldn't the fault type come from _hardware_? In above self-test use-case, there is no reliance on hardware support, and fault->is_private should always be treated as being whatever was set by the VMM via KVM_SET_MEMORY_ATTRIBUTES, so that's why I proposed the KVM_FAULT_VMM_DEFINED value to encode that case into fault->is_private so KVM MMU and handle protected self-test VMs of this sort. In the future, this protected self-test VMs might become the basis of a new protected VM type where some sort of guest-issued hypercall can be used to set whether a fault should be treated as shared/private, rather than relying on VMM-defined value. There's some current discussion about that here: https://lore.kernel.org/lkml/20230620190443.GU2244082@xxxxxxxxxxxxxxxxxxxxx/T/#me627bed3d9acf73ea882e8baa76dfcb27759c440 Going back to your callback question above, that makes things a little awkward, since kvm_x86_ops is statically defined for both kvm_amd/kvm_intel modules, and either can run these self-tests guests as well as these proposed "non-CC VMs" which rely on enlightened guest kernels instead of TDX/SNPhardware support for managing private/shared access. So you either need to duplicate the handling for how to determine private/shared for these other types into the kvm_intel/kvm_amd callbacks, or have some way for the callback to say to "fall back to the common handling for self-tests and non-CC VMs". The latter is what we implemented in v8 of this series, but Isaku suggested it was a bit too heavyweight and proposed dropping the fall-back logic in favor of updating the kvm_x86_ops at run-time once we know whether or not it's a TDX/SNP guest: https://lkml.iu.edu/hypermail/linux/kernel/2303.2/03009.html which could work, but it still doesn't address Sean's desire to avoid callbacks completely, and still amounts to a somewhat convulated way to hide away TDX/SNP-specific bit checks for shared/private. Rather than hide them away in callbacks that are already frowned upon by maintainer, I think it makes sense to "open-code" all these checks in a common handler like kvm_fault_is_private() to we can make some progress toward a consensus, and then iterate on it from there rather than refining what may already be a dead-end path. -Mike