On 11/23/2017 06:06 PM, Christoffer Dall wrote: > On Thu, Nov 23, 2017 at 05:17:00PM +0100, Paolo Bonzini wrote: >> On 23/11/2017 17:05, Christoffer Dall wrote: >>> For example, >>> arm64 is about to do significant work in vcpu load/put when running a >>> vcpu, but not when doing things like KVM_SET_ONE_REG or >>> KVM_SET_MP_STATE. >> >> Out of curiosity, in what circumstances are these ioctls a hot path? >> Especially KVM_SET_MP_STATE. >> > > Perhaps my commit message was misleading; we only want to do that for > KVM_RUN, and not for anything else. We're already doing things like > potentially jumping to hyp mode and flushing VMIDs which really > shouldn't be done unless we actually plan on running a VCPU, and we're > going to do things like setting up the timer to handle timer interrupts > in an ISR, which doesn't make sense unless the VCPU is running. > > Add to that, that loading an entire VM's state onto hardware, only to > read back a single register from hardware and returning it to user > space, doesn't really fall within optimization vs. non-optimization in > the critical path, but is just wrong, IMHO. > >>> Hi all, >>> >>> Drew suggested this as an alternative approach to recording the ioctl >>> number on the vcpu struct [1] as it may benefit other architectures in >>> general. >>> >>> I had a look at some of the specific ioctls across architectures, but >>> must admit that I can't easily tell which architecture specific logic >>> relies on having registered preempt notifiers and having called the >>> architecture specific load function. >>> >>> It would be great if you would let me know if you think this is >>> generally useful or if you prefer the less invasive approach, and in >>> case this is useful, if you could have a look at all the vcpu ioctls for >>> your architecture and let me know if I am being too loose or too >>> careful in calling __vcpu_load() in this patch. >> >> I can suggest a third approach: >> >> if (ioctl == KVM_GET_ONE_REG || ioctl == KVM_SET_ONE_REG) >> return kvm_arch_vcpu_ioctl(filp, ioctl, arg); >> >> in kvm_vcpu_ioctl before "r = vcpu_load(vcpu);", or even better: >> >> if (ioctl == KVM_GET_ONE_REG) >> // call kvm_arch_vcpu_get_one_reg_ioctl(vcpu, ®); >> // and do copy_to_user >> return kvm_vcpu_get_one_reg_ioctl(vcpu, arg); >> if (ioctl == KVM_SET_ONE_REG) >> // do copy_from_user then call >> // kvm_arch_vcpu_set_one_reg_ioctl(vcpu, ®); >> return kvm_vcpu_set_one_reg_ioctl(vcpu, arg); >> >> so that the kvm_arch_vcpu_get/set_one_reg_ioctl functions are called >> without the lock. >> >> Then all architectures except ARM can be switched to do >> vcpu_load/vcpu_put in kvm_arch_vcpu_get/set_one_reg_ioctl > > That doesn't solve my need as I want to *only* do the arch vcpu_load for > KVM_RUN, I should have been more clear in the commit message. What about splitting arch_vcpu_load/put into two callbacks and call the 2nd one only for VCPU_run? e.g. keep arch_vcpu_load and add arch_vcpu_load_run and arch_vcpu_unload_run Then every architecture can move things from arch_vcpu_load into arch_vcpu_load_run if its only necessary for RUN.