On Thu, Sep 7, 2023 at 4:08 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Wed, Sep 06, 2023, Xiaoyao Li wrote: > > On 9/6/2023 2:24 PM, Hao Peng wrote: > > > From: Peng Hao <flyingpeng@xxxxxxxxxxx> > > > > > > The call of vcpu_load/put takes about 1-2us. Each > > > kvm_arch_vcpu_create will call vcpu_load/put > > > to initialize some fields of vmcs, which can be > > > delayed until the call of vcpu_ioctl to process > > > this part of the vmcs field, which can reduce calls > > > to vcpu_load. > > > > what if no vcpu ioctl is called after vcpu creation? > > > > And will the first (it was second before this patch) vcpu_load() becomes > > longer? have you measured it? > > I don't think the first vcpu_load() becomes longer, this avoids an entire > load()+put() pair by doing the initialization in the first ioctl(). > > That said, the patch is obviously buggy, it hooks kvm_arch_vcpu_ioctl() instead > of kvm_vcpu_ioctl(), e.g. doing KVM_RUN, KVM_SET_SREGS, etc. will cause explosions. > > It will also break the TSC synchronization logic in kvm_arch_vcpu_postcreate(), > which can "race" with ioctls() as the vCPU file descriptor is accessible by > userspace the instant it's installed into the fd tables, i.e. userspace doesn't > have to wait for KVM_CREATE_VCPU to complete. > It works when there are many cores. The hook point problem mentioned above can still be adjusted, but the tsc synchronization problem is difficult to deal with. thanks. > And I gotta imagine there are other interactions I haven't thought of off the > top of my head, e.g. the vCPU is also reachable via kvm_for_each_vcpu(). All it > takes is one path that touches a lazily initialized field for this to fall apart. > > > I don't think it worth the optimization unless a strong reason. > > Yeah, this is a lot of subtle complexity to shave 1-2us.