On Mon, Mar 21, 2022 at 3:11 PM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote: > > On Mon, 2022-03-21 at 14:59 -0700, Jim Mattson wrote: > > On Mon, Mar 21, 2022 at 2:36 PM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote: > > > On Wed, 2022-03-09 at 11:07 -0800, Jim Mattson wrote: > > > > On Wed, Mar 9, 2022 at 10:47 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > > > On 3/9/22 19:35, Jim Mattson wrote: > > > > > > I didn't think pause filtering was virtualizable, since the value of > > > > > > the internal counter isn't exposed on VM-exit. > > > > > > > > > > > > On bare metal, for instance, assuming the hypervisor doesn't intercept > > > > > > CPUID, the following code would quickly trigger a PAUSE #VMEXIT with > > > > > > the filter count set to 2. > > > > > > > > > > > > 1: > > > > > > pause > > > > > > cpuid > > > > > > jmp 1 > > > > > > > > > > > > Since L0 intercepts CPUID, however, L2 will exit to L0 on each loop > > > > > > iteration, and when L0 resumes L2, the internal counter will be set to > > > > > > 2 again. L1 will never see a PAUSE #VMEXIT. > > > > > > > > > > > > How do you handle this? > > > > > > > > > > > > > > > > I would expect that the same would happen on an SMI or a host interrupt. > > > > > > > > > > 1: > > > > > pause > > > > > outl al, 0xb2 > > > > > jmp 1 > > > > > > > > > > In general a PAUSE vmexit will mostly benefit the VM that is pausing, so > > > > > having a partial implementation would be better than disabling it > > > > > altogether. > > > > > > > > Indeed, the APM does say, "Certain events, including SMI, can cause > > > > the internal count to be reloaded from the VMCB." However, expanding > > > > that set of events so much that some pause loops will *never* trigger > > > > a #VMEXIT seems problematic. If the hypervisor knew that the PAUSE > > > > filter may not be triggered, it could always choose to exit on every > > > > PAUSE. > > > > > > > > Having a partial implementation is only better than disabling it > > > > altogether if the L2 pause loop doesn't contain a hidden #VMEXIT to > > > > L0. > > > > > > > > > > Hi! > > > > > > You bring up a very valid point, which I didn't think about. > > > > > > However after thinking about this, I think that in practice, > > > this isn't a show stopper problem for exposing this feature to the guest. > > > > > > > > > This is what I am thinking: > > > > > > First lets assume that the L2 is malicious. In this case no doubt > > > it can craft such a loop which will not VMexit on PAUSE. > > > But that isn't a problem - instead of this guest could have just used NOP > > > which is not possible to intercept anyway - no harm is done. > > > > > > Now lets assume a non malicious L2: > > > > > > > > > First of all the problem can only happen when a VM exit is intercepted by L0, > > > and not by L1. Both above cases usually don't pass this criteria since L1 is highly > > > likely to intercept both CPUID and IO port access. It is also highly unlikely > > > to allow L2 direct access to L1's mmio ranges. > > > > > > Overall there are very few cases of deterministic vm exit which is intercepted > > > by L0 but not L1. If that happens then L1 will not catch the PAUSE loop, > > > which is not different much from not catching it because of not suitable > > > thresholds. > > > > > > Also note that this is an optimization only - due to count and threshold, > > > it is not guaranteed to catch all pause loops - in fact hypervisor has > > > to guess these values, and update them in attempt to catch as many such > > > loops as it can. > > > > > > I think overall it is OK to expose that feature to the guest > > > and it should even improve performance in some cases - currently > > > at least nested KVM intercepts every PAUSE otherwise. > > > > Can I at least request that this behavior be documented as a KVM > > virtual CPU erratum? > > 100%. Do you have a pointer where to document it? I think this will be the first KVM virtual CPU erratum documented, though there are plenty of others that I'd like to see documented (e.g. nVMX processes posted interrupts on emulated VM-entry, AMD's merged PMU counters are only 48 bits wide, etc.). Maybe Paolo has some ideas? > Best regards, > Maxim Levitsky > > > > > > Best regards, > > > Maxim Levitsky > > > > > > > > > > > > > >