stsp <stsp2@xxxxxxxxx> writes: > 28.06.2021 13:07, Vitaly Kuznetsov пишет: >> stsp <stsp2@xxxxxxxxx> writes: >> >>> 28.06.2021 10:20, Vitaly Kuznetsov пишет: >>>> Stas Sergeev <stsp2@xxxxxxxxx> writes: >>>> >>>>> When returning to user, the special care is taken about the >>>>> exception that was already injected to VMCS but not yet to guest. >>>>> cancel_injection removes such exception from VMCS. It is set as >>>>> pending, and if the user does KVM_SET_REGS, it gets completely canceled. >>>>> >>>>> This didn't happen though, because the vcpu->arch.exception.injected >>>>> and vcpu->arch.exception.pending were forgotten to update in >>>>> cancel_injection. As the result, KVM_SET_REGS didn't cancel out >>>>> anything, and the exception was re-injected on the next KVM_RUN, >>>>> even though the guest registers (like EIP) were already modified. >>>>> This was leading to an exception coming from the "wrong place". >>>> It shouldn't be that hard to reproduce this in selftests, I >>>> believe. >>> Unfortunately the problem happens only on core2 CPU. I believe the reason >>> is perhaps that more modern CPUs do not go to software for the exception >>> injection? >> Hm, I've completely missed that from the original description. As I read >> it, 'cancel_injection' path in vcpu_enter_guest() is always broken when >> vcpu->arch.exception.injected is set as we forget to clear it... > > Yes, cancel_injection is supposed to > be always broken indeed. But there > are a few more things to it. > Namely: > - Other CPUs do not seem to exhibit > that path. My guess here is that they > just handle the exception in hardware, > without returning to KVM for that. I > am not sure why Core2 vmexits per > each page fault. Is it incapable of > handling the PF in hardware, or maybe > some other bug is around? Wild guess: no EPT support and running on shadow pages? > > - Even if you followed the broken > path, in most cases everything is still > fine: the exception will just be re-injected. > The unfortunate scenario is when you > have _TIF_SIGPENDING at exactly > right place. Then you go to user-space, > and the user-space is unlucky to use > SET_REGS right here. These conditions > are not very likely to happen. I wrote a > test-case for it, but it involves the entire > buildroot setup and you need to wait > a bit while it is trying to trigger the race. Maybe there's an easier way to trigger imminent exit to userspace which doesn't involve > > >>>> 'exception.injected' can even be set through >>>> KVM_SET_VCPU_EVENTS and then we call KVM_SET_REGS. >>> Does this mean I shouldn't add WARN_ON_ONCE()? >> WARN_ON_ONCE() is fine IMO in case there's no valid case when >> 'vcpu->arch.exception.injected' is set during __set_regs(). > > But you said: > >> 'exception.injected' can even be set through >> KVM_SET_VCPU_EVENTS and then we call KVM_SET_REGS. > > ... which makes such scenario valid? > We should not add userspace-triggerable WARNs in kernel, right. I was not sure if the WARN you add stays triggerable post-patch. > > >>> >>>> Alternatively, we can >>>> trigger a real exception from the guest. Could you maybe add something >>>> like this to tools/testing/selftests/kvm/x86_64/set_sregs_test.c? >>> Even if you have the right CPU to reproduce that (Core2), you also >>> need the _TIF_SIGPENDING at the right moment to provoke the cancel_injection >>> path. This is like triggering a race. If you don't get _TIF_SIGPENDING >>> then it will just re-enter guest and inject the exception properly. >> I'd like to understand the hardware dependency first. Is it possible >> that the exception which causes the problem is not triggered on other >> CPUs? > > No, exception is triggered, but I > have never seen the race on any > other CPUs, and none of the people > who reported that problem to me, > have seen it on any other CPU. > I think other CPU just injects the PF > without doing any vmexit, but I've > no idea why Core2 does not do the > same thing. Should it? Maybe the huge amount of injected #PFs (which are triggered because there's no EPT) contribute to the easiness of the reproduction? Purely from from looking at the code of your patch, the issue should also happen with other exceptions, KVM just doesn't inject them that often. It doesn't mean that we can't craft something from selftests, just need to understand the required conditions... -- Vitaly