Re: [PATCH] KVM: X86: Fix exception untrigger on ret to user

stsp <stsp2@xxxxxxxxx> · Mon, 28 Jun 2021 13:32:43 +0300

28.06.2021 13:07, Vitaly Kuznetsov пишет:
stsp <stsp2@xxxxxxxxx> writes:

28.06.2021 10:20, Vitaly Kuznetsov пишет:
Stas Sergeev <stsp2@xxxxxxxxx> writes:

When returning to user, the special care is taken about the
exception that was already injected to VMCS but not yet to guest.
cancel_injection removes such exception from VMCS. It is set as
pending, and if the user does KVM_SET_REGS, it gets completely canceled.

This didn't happen though, because the vcpu->arch.exception.injected
and vcpu->arch.exception.pending were forgotten to update in
cancel_injection. As the result, KVM_SET_REGS didn't cancel out
anything, and the exception was re-injected on the next KVM_RUN,
even though the guest registers (like EIP) were already modified.
This was leading to an exception coming from the "wrong place".
It shouldn't be that hard to reproduce this in selftests, I
believe.
Unfortunately the problem happens only on core2 CPU. I believe the reason
is perhaps that more modern CPUs do not go to software for the exception
injection?
Hm, I've completely missed that from the original description. As I read
it, 'cancel_injection' path in vcpu_enter_guest() is always broken when
vcpu->arch.exception.injected is set as we forget to clear it...

Yes, cancel_injection is supposed to
be always broken indeed. But there
are a few more things to it.
Namely:
- Other CPUs do not seem to exhibit
that path. My guess here is that they
just handle the exception in hardware,
without returning to KVM for that. I
am not sure why Core2 vmexits per
each page fault. Is it incapable of
handling the PF in hardware, or maybe
some other bug is around?

- Even if you followed the broken
path, in most cases everything is still
fine: the exception will just be re-injected.
The unfortunate scenario is when you
have _TIF_SIGPENDING at exactly
right place. Then you go to user-space,
and the user-space is unlucky to use
SET_REGS right here. These conditions
are not very likely to happen. I wrote a
test-case for it, but it involves the entire
buildroot setup and you need to wait
a bit while it is trying to trigger the race.

   'exception.injected' can even be set through
KVM_SET_VCPU_EVENTS and then we call KVM_SET_REGS.
Does this mean I shouldn't add WARN_ON_ONCE()?
WARN_ON_ONCE() is fine IMO in case there's no valid case when
'vcpu->arch.exception.injected' is set during __set_regs().

But you said:

'exception.injected' can even be set through
KVM_SET_VCPU_EVENTS and then we call KVM_SET_REGS.

... which makes such scenario valid?

   Alternatively, we can
trigger a real exception from the guest. Could you maybe add something
like this to tools/testing/selftests/kvm/x86_64/set_sregs_test.c?
Even if you have the right CPU to reproduce that (Core2), you also
need the _TIF_SIGPENDING at the right moment to provoke the cancel_injection
path. This is like triggering a race. If you don't get _TIF_SIGPENDING
then it will just re-enter guest and  inject the exception properly.
I'd like to understand the hardware dependency first. Is it possible
that the exception which causes the problem is not triggered on other
CPUs?

No, exception is triggered, but I
have never seen the race on any
other CPUs, and none of the people
who reported that problem to me,
have seen it on any other CPU.
I think other CPU just injects the PF
without doing any vmexit, but I've
no idea why Core2 does not do the
same thing. Should it?