On Tue, Jan 28, 2025, Zheyun Shen wrote: > On AMD CPUs without ensuring cache consistency, each memory page > reclamation in an SEV guest triggers a call to wbinvd_on_all_cpus(), > thereby affecting the performance of other programs on the host. > > Typically, an AMD server may have 128 cores or more, while the SEV guest > might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity > to bind these 8 vCPUs to specific physical CPUs. > > Therefore, keeping a record of the physical core numbers each time a vCPU > runs can help avoid flushing the cache for all CPUs every time. > > Signed-off-by: Zheyun Shen <szy0127@xxxxxxxxxxx> > --- > arch/x86/kvm/svm/sev.c | 30 +++++++++++++++++++++++++++--- > arch/x86/kvm/svm/svm.c | 2 ++ > arch/x86/kvm/svm/svm.h | 5 ++++- > 3 files changed, 33 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 1ce67de9d..4b80ecbe7 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -252,6 +252,27 @@ static void sev_asid_free(struct kvm_sev_info *sev) > sev->misc_cg = NULL; > } > > +void sev_vcpu_load(struct kvm_vcpu *vcpu, int cpu) And now I'm very confused. v1 and v2 marked the CPU dirty in pre_sev_run(), which AFAICT is exactly when a CPU should be recorded as having dirtied memory. v3 fixed a bug with using get_cpu(), but otherwise was unchanged. Tom even gave a Tested-by for v3. Then v4 comes along, and without explanation, moved the code to vcpu_load(). Changed the time of recording the CPUs from pre_sev_run() to vcpu_load(). Why? If there's a good reason, then that absolutely, positively belongs in the changelog and in the code as a comment. If there's no good reason, then... Unless I hear otherwise, my plan is to move this back to pre_sev_run(). > +{ > + /* > + * To optimize cache flushes when memory is reclaimed from an SEV VM, > + * track physical CPUs that enter the guest for SEV VMs and thus can > + * have encrypted, dirty data in the cache, and flush caches only for > + * CPUs that have entered the guest. > + */ > + cpumask_set_cpu(cpu, to_kvm_sev_info(vcpu->kvm)->wbinvd_dirty_mask); > +} > + > +static void sev_do_wbinvd(struct kvm *kvm) > +{ > + /* > + * TODO: Clear CPUs from the bitmap prior to flushing. Doing so > + * requires serializing multiple calls and having CPUs mark themselves > + * "dirty" if they are currently running a vCPU for the VM. > + */ A comment is definitely warranted, but I don't think we should mark it TODO. I'm not convinced the benefits justify the complexity, and I don't want someone trying to "fix" the code because it has a TODO. > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index 43fa6a16e..82ec80cf4 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -112,6 +112,8 @@ struct kvm_sev_info { > void *guest_req_buf; /* Bounce buffer for SNP Guest Request input */ > void *guest_resp_buf; /* Bounce buffer for SNP Guest Request output */ > struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */ > + /* CPUs invoked VMRUN call wbinvd after guest memory is reclaimed */ > + struct cpumask *wbinvd_dirty_mask; This needs to be cpumask_var_t, as the cpumask APIs expect the mask to be statically allocated when CONFIG_CPUMASK_OFFSTACK=n. E.g. this will hit a NULL pointer deref. static __always_inline bool zalloc_cpumask_var(cpumask_var_t *mask, gfp_t flags) { cpumask_clear(*mask); return true; } The wbinvd_dirty_mask name also turns out to be less than good. In part because of the looming WBNOINVD change, but also because it kinda sorta collides with wbinvd_dirty_mask in kvm_vcpu_arch, which gets really confusing when trying to read the code. I don't have any great ideas, the best I came up with was have_run_cpus.