On 2023.07.13 21:32, Xiaoyao Li wrote: > On 7/13/2023 10:50 AM, Wang Jianchao wrote: >> >> >> On 2023.07.13 02:14, Zhi Wang wrote: >>> On Fri, 7 Jul 2023 14:17:58 +0800 >>> Wang Jianchao <jianchwa@xxxxxxxxxxx> wrote: >>> >>>> Hi >>>> >>>> This patchset attemps to introduce a new pv feature, lazy tscdeadline. >>>> Everytime guest write msr of MSR_IA32_TSC_DEADLINE, a vm-exit occurs >>>> and host side handle it. However, a lot of the vm-exit is unnecessary >>>> because the timer is often over-written before it expires. >>>> >>>> v : write to msr of tsc deadline >>>> | : timer armed by tsc deadline >>>> >>>> v v v v v | | | | | >>>> ---------------------------------------> Time >>>> >>>> The timer armed by msr write is over-written before expires and the >>>> vm-exit caused by it are wasted. The lazy tscdeadline works as following, >>>> >>>> v v v v v | | >>>> ---------------------------------------> Time >>>> '- arm -' >>>> >>> >>> Interesting patch. >>> >>> I am a little bit confused of the chart above. It seems the write of MSR, >>> which is said to cause VM exit, is not reduced in the chart of lazy >>> tscdeadline, only the times of arm are getting less. And the benefit of >>> lazy tscdeadline is said coming from "less vm exit". Maybe it is better >>> to imporve the chart a little bit to help people jump into the idea >>> easily? >> >> Thanks so much for you comment and sorry for my poor chart. >> >> Let me try to rework the chart. >> >> Before this patch, every time guest start or modify a hrtimer, we need to write the msr of tsc deadline, >> a vm-exit occurs and host arms a hv or sw timer for it. >> >> >> w: write msr >> x: vm-exit >> t: hv or sw timer >> >> >> Guest >> w >> ---------------------------------------> Time >> Host x t >> >> However, in some workload that needs setup timer frequently, msr of tscdeadline is usually overwritten >> many times before the timer expires. And every time we modify the tscdeadline, a vm-exit ocurrs >> >> >> 1. write to msr with t0 >> >> Guest >> w0 >> ----------------------------------------> Time >> Host x0 t0 >> >> 2. write to msr with t1 >> Guest >> w1 >> ------------------------------------------> Time >> Host x1 t0->t1 >> >> >> 2. write to msr with t2 >> Guest >> w2 >> ------------------------------------------> Time >> Host x2 t1->t2 >> >> 3. write to msr with t3 >> Guest >> w3 >> ------------------------------------------> Time >> Host x3 t2->t3 >> >> >> >> What this patch want to do is to eliminate the vm-exit of x1 x2 and x3 as following, >> >> >> Firstly, we have two fields shared between guest and host as other pv features, saying, >> - armed, the value of tscdeadline that has a timer in host side, only updated by __host__ side >> - pending, the next value of tscdeadline, only updated by __guest__ side >> >> >> 1. write to msr with t0 >> >> armed : t0 >> pending : t0 >> Guest >> w0 >> ----------------------------------------> Time >> Host x0 t0 >> >> vm-exit occurs and arms a timer for t0 in host side > > What's the initial value of @armed and @pending? Both of them are zero. @armed is only updated by host @pending is updated by guest Guest side will check @armed, it it is zero, jumps to wrmsrl > >> 2. write to msr with t1 >> >> armed : t0 >> pending : t1 >> >> Guest >> w1 >> ------------------------------------------> Time >> Host t0 >> >> the value of tsc deadline that has been armed, namely t0, is smaller than t1, needn't to write >> to msr but just update pending > > if t1 < t0, then it triggers the vm exit, right? Yes. If new tsc deadline value is smaller than @armed, namely t1 here, it jumps to wrmsrl > And in this case, I think @armed will be updated to t1. What about pending? will it get updated to t1 or not? Yes, the guest jumps to wrmsrl and causes a vm-exit, the host side will update the @armed and re-arm the timer Thanks Jianchao > >> >> 3. write to msr with t2 >> >> armed : t0 >> pending : t2 >> Guest >> w2 >> ------------------------------------------> Time >> Host t0 >> Similar with step 2, just update pending field with t2, no vm-exit >> >> >> 4. write to msr with t3 >> >> armed : t0 >> pending : t3 >> >> Guest >> w3 >> ------------------------------------------> Time >> Host t0 >> Similar with step 2, just update pending field with t3, no vm-exit >> >> >> 5. t0 expires, arm t3 >> >> armed : t3 >> pending : t3 >> >> >> Guest >> ------------------------------------------> Time >> Host t0 ------> t3 >> >> t0 is fired, it checks the pending field and re-arm a timer based on it. >> >> >> Here is the core ideal of this patch ;) >> >> >> Thanks >> Jianchao >> >>> >>>> The 1st timer is responsible for arming the next timer. When the armed >>>> timer is expired, it will check pending and arm a new timer. >>>> >>>> In the netperf test with TCP_RR on loopback, this lazy_tscdeadline can >>>> reduce vm-exit obviously. >>>> >>>> Close Open >>>> -------------------------------------------------------- >>>> VM-Exit >>>> sum 12617503 5815737 >>>> intr 0% 37023 0% 33002 >>>> cpuid 0% 1 0% 0 >>>> halt 19% 2503932 47% 2780683 >>>> msr-write 79% 10046340 51% 2966824 >>>> pause 0% 90 0% 84 >>>> ept-violation 0% 584 0% 336 >>>> ept-misconfig 0% 0 0% 2 >>>> preemption-timer 0% 29518 0% 34800 >>>> ------------------------------------------------------- >>>> MSR-Write >>>> sum 10046455 2966864 >>>> apic-icr 25% 2533498 93% 2781235 >>>> tsc-deadline 74% 7512945 6% 185629 >>>> >>>> This patchset is made and tested on 6.4.0, includes 3 patches, >>>> >>>> The 1st one adds necessary data structures for this feature >>>> The 2nd one adds the specific msr operations between guest and host >>>> The 3rd one are the one make this feature works. >>>> >>>> Any comment is welcome. >>>> >>>> Thanks >>>> Jianchao >>>> >>>> Wang Jianchao (3) >>>> KVM: x86: add msr register and data structure for lazy tscdeadline >>>> KVM: x86: exchange info about lazy_tscdeadline with msr >>>> KVM: X86: add lazy tscdeadline support to reduce vm-exit of msr-write >>>> >>>> >>>> arch/x86/include/asm/kvm_host.h | 10 ++++++++ >>>> arch/x86/include/uapi/asm/kvm_para.h | 9 +++++++ >>>> arch/x86/kernel/apic/apic.c | 47 ++++++++++++++++++++++++++++++++++- >>>> arch/x86/kernel/kvm.c | 13 ++++++++++ >>>> arch/x86/kvm/cpuid.c | 1 + >>>> arch/x86/kvm/lapic.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------ >>>> arch/x86/kvm/lapic.h | 4 +++ >>>> arch/x86/kvm/x86.c | 26 ++++++++++++++++++++ >>>> 8 files changed, 229 insertions(+), 9 deletions(-) >>> >