On Wed, Oct 20, 2021, Sergey Senozhatsky wrote: > On (21/10/20 00:32), Sergey Senozhatsky wrote: > > static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) > > { > > struct kvm_clock_data data; > > @@ -6169,6 +6189,15 @@ long kvm_arch_vm_ioctl(struct file *filp, > > case KVM_X86_SET_MSR_FILTER: > > r = kvm_vm_ioctl_set_msr_filter(kvm, argp); > > break; > > + case KVM_SET_MMU_PREFETCH: { > > + u64 val; > > + > > + r = -EFAULT; > > + if (copy_from_user(&val, argp, sizeof(val))) > > + goto out; > > + r = kvm_arch_mmu_pte_prefetch(kvm, val); > > + break; > > + } > > A side question: is there any value in turning this into a per-VCPU ioctl? > So that, say, on heterogeneous systems big cores can prefetch more than > little cores, for instance. I don't think so? If anything, such behavior should probably be tied to the pCPU, not vCPU. Though I'm guessing the difference in optimal prefetch size between big and little cores is in the noise. I suspect the optimal prefetch size is more dependent on the guest workload than the core its running on. There's likely a correlation between the core size and the workload, but for that to have any meaning the vCPU would have be affined to a core (or set of cores), i.e having the behavior tied to the pCPU as opposed to the vCPU would work just as well. If the optimal setting is based on the speed of the core, not the workload, then per-pCPU is again preferable as it "works" regardless of vCPU affinity.