On Mon, Mar 21, 2022 at 2:36 PM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote: > > On Wed, 2022-03-09 at 11:07 -0800, Jim Mattson wrote: > > On Wed, Mar 9, 2022 at 10:47 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > On 3/9/22 19:35, Jim Mattson wrote: > > > > I didn't think pause filtering was virtualizable, since the value of > > > > the internal counter isn't exposed on VM-exit. > > > > > > > > On bare metal, for instance, assuming the hypervisor doesn't intercept > > > > CPUID, the following code would quickly trigger a PAUSE #VMEXIT with > > > > the filter count set to 2. > > > > > > > > 1: > > > > pause > > > > cpuid > > > > jmp 1 > > > > > > > > Since L0 intercepts CPUID, however, L2 will exit to L0 on each loop > > > > iteration, and when L0 resumes L2, the internal counter will be set to > > > > 2 again. L1 will never see a PAUSE #VMEXIT. > > > > > > > > How do you handle this? > > > > > > > > > > I would expect that the same would happen on an SMI or a host interrupt. > > > > > > 1: > > > pause > > > outl al, 0xb2 > > > jmp 1 > > > > > > In general a PAUSE vmexit will mostly benefit the VM that is pausing, so > > > having a partial implementation would be better than disabling it > > > altogether. > > > > Indeed, the APM does say, "Certain events, including SMI, can cause > > the internal count to be reloaded from the VMCB." However, expanding > > that set of events so much that some pause loops will *never* trigger > > a #VMEXIT seems problematic. If the hypervisor knew that the PAUSE > > filter may not be triggered, it could always choose to exit on every > > PAUSE. > > > > Having a partial implementation is only better than disabling it > > altogether if the L2 pause loop doesn't contain a hidden #VMEXIT to > > L0. > > > > Hi! > > You bring up a very valid point, which I didn't think about. > > However after thinking about this, I think that in practice, > this isn't a show stopper problem for exposing this feature to the guest. > > > This is what I am thinking: > > First lets assume that the L2 is malicious. In this case no doubt > it can craft such a loop which will not VMexit on PAUSE. > But that isn't a problem - instead of this guest could have just used NOP > which is not possible to intercept anyway - no harm is done. > > Now lets assume a non malicious L2: > > > First of all the problem can only happen when a VM exit is intercepted by L0, > and not by L1. Both above cases usually don't pass this criteria since L1 is highly > likely to intercept both CPUID and IO port access. It is also highly unlikely > to allow L2 direct access to L1's mmio ranges. > > Overall there are very few cases of deterministic vm exit which is intercepted > by L0 but not L1. If that happens then L1 will not catch the PAUSE loop, > which is not different much from not catching it because of not suitable > thresholds. > > Also note that this is an optimization only - due to count and threshold, > it is not guaranteed to catch all pause loops - in fact hypervisor has > to guess these values, and update them in attempt to catch as many such > loops as it can. > > I think overall it is OK to expose that feature to the guest > and it should even improve performance in some cases - currently > at least nested KVM intercepts every PAUSE otherwise. Can I at least request that this behavior be documented as a KVM virtual CPU erratum? > > Best regards, > Maxim Levitsky > > > >