On 2023.07.13 18:27, Xiaoyao Li wrote: > On 7/13/2023 2:57 PM, Zhi Wang wrote: >> On Thu, 13 Jul 2023 10:50:36 +0800 >> Wang Jianchao <jianchwa@xxxxxxxxxxx> wrote: >> >>> >>> >>> On 2023.07.13 02:14, Zhi Wang wrote: >>>> On Fri, 7 Jul 2023 14:17:58 +0800 >>>> Wang Jianchao <jianchwa@xxxxxxxxxxx> wrote: >>>> >>>>> Hi >>>>> >>>>> This patchset attemps to introduce a new pv feature, lazy tscdeadline. >>>>> Everytime guest write msr of MSR_IA32_TSC_DEADLINE, a vm-exit occurs >>>>> and host side handle it. However, a lot of the vm-exit is unnecessary >>>>> because the timer is often over-written before it expires. >>>>> >>>>> v : write to msr of tsc deadline >>>>> | : timer armed by tsc deadline >>>>> >>>>> v v v v v | | | | | >>>>> ---------------------------------------> Time >>>>> >>>>> The timer armed by msr write is over-written before expires and the >>>>> vm-exit caused by it are wasted. The lazy tscdeadline works as following, >>>>> >>>>> v v v v v | | >>>>> ---------------------------------------> Time >>>>> '- arm -' >>>>> >>>> >>>> Interesting patch. >>>> >>>> I am a little bit confused of the chart above. It seems the write of MSR, >>>> which is said to cause VM exit, is not reduced in the chart of lazy >>>> tscdeadline, only the times of arm are getting less. And the benefit of >>>> lazy tscdeadline is said coming from "less vm exit". Maybe it is better >>>> to imporve the chart a little bit to help people jump into the idea >>>> easily? >>> >>> Thanks so much for you comment and sorry for my poor chart. >>> >> >> You don't have to say sorry here. :) Save it for later when you actually >> break something. >> >>> Let me try to rework the chart. >>> >>> Before this patch, every time guest start or modify a hrtimer, we need to write the msr of tsc deadline, >>> a vm-exit occurs and host arms a hv or sw timer for it. >>> >>> >>> w: write msr >>> x: vm-exit >>> t: hv or sw timer >>> >>> >>> Guest >>> w >>> ---------------------------------------> Time >>> Host x t >>> >>> However, in some workload that needs setup timer frequently, msr of tscdeadline is usually overwritten >>> many times before the timer expires. And every time we modify the tscdeadline, a vm-exit ocurrs >>> >>> >>> 1. write to msr with t0 >>> >>> Guest >>> w0 >>> ----------------------------------------> Time >>> Host x0 t0 >>> >>> 2. write to msr with t1 >>> Guest >>> w1 >>> ------------------------------------------> Time >>> Host x1 t0->t1 >>> >>> >>> 2. write to msr with t2 >>> Guest >>> w2 >>> ------------------------------------------> Time >>> Host x2 t1->t2 >>> >>> 3. write to msr with t3 >>> Guest >>> w3 >>> ------------------------------------------> Time >>> Host x3 t2->t3 >>> >>> >>> >>> What this patch want to do is to eliminate the vm-exit of x1 x2 and x3 as following, >>> >>> >>> Firstly, we have two fields shared between guest and host as other pv features, saying, >>> - armed, the value of tscdeadline that has a timer in host side, only updated by __host__ side >>> - pending, the next value of tscdeadline, only updated by __guest__ side >>> >>> >>> 1. write to msr with t0 >>> >>> armed : t0 >>> pending : t0 >>> Guest >>> w0 >>> ----------------------------------------> Time >>> Host x0 t0 >>> >>> vm-exit occurs and arms a timer for t0 in host side >>> >>> 2. write to msr with t1 >>> >>> armed : t0 >>> pending : t1 >>> >>> Guest >>> w1 >>> ------------------------------------------> Time >>> Host t0 >>> >>> the value of tsc deadline that has been armed, namely t0, is smaller than t1, needn't to write >>> to msr but just update pending >>> >>> >>> 3. write to msr with t2 >>> >>> armed : t0 >>> pending : t2 >>> Guest >>> w2 >>> ------------------------------------------> Time >>> Host t0 >>> Similar with step 2, just update pending field with t2, no vm-exit >>> >>> >>> 4. write to msr with t3 >>> >>> armed : t0 >>> pending : t3 >>> >>> Guest >>> w3 >>> ------------------------------------------> Time >>> Host t0 >>> Similar with step 2, just update pending field with t3, no vm-exit >>> >>> >>> 5. t0 expires, arm t3 >>> >>> armed : t3 >>> pending : t3 >>> >>> >>> Guest >>> ------------------------------------------> Time >>> Host t0 ------> t3 >>> >>> t0 is fired, it checks the pending field and re-arm a timer based on it. >>> >>> >>> Here is the core ideal of this patch ;) >>> >> >> That's much better. Please keep this in the cover letter in the next RFC. >> >> My concern about this approach is: it might slightly affect timing >> sensitive workload in the guest, as the approach merges the deadline >> interrupt. The guest might see less deadline interrupts than before. It >> might be better to have a comparison of number of deadline interrupts >> in the cover letter. > > I don't think guest will get less deadline interrupts since the deadline is updated always before the timer expires. > > However, host will get more deadline interrupt because timer for t0 is not disarmed when new deadline (t1, t2, t3) is programmed. > I forget to avoid to inject local timer interrupt of t0 in this version. This will be modified in V3 patchset. But there is still a vm-exit of preemption timer for t0 ... The worst case is: guest program t0 t1, t1's vm-exit due to msr write is avoided but t0's preemption vm-exit replace it. In the other case, there should be benefit of vm-exit. Thanks Jianchao