06.07.2021 14:49, Maxim Levitsky пишет:
Now about the KVM's userspace API where this is exposed:
I see now too that KVM_SET_REGS clears the pending exception.
This is new to me and it is IMHO *wrong* thing to do.
However I bet that someone somewhere depends on this,
since this behavior is very old.
What alternative would you suggest?
Check for ready_for_interrupt_injection
and never call KVM_SET_REGS if it indicates
"not ready"?
But what if someone calls it nevertheless?
Perhaps return an error from KVM_SET_REGS
if exception is pending? Also KVM_SET_SREGS
needs some treatment here too, as it can
also be called when an exception is pending,
leading to problems.
This doesn't affect qemu since when it does KVM_SET_REGS,
it also does KVM_SET_VCPU_EVENTS afterward, and that
restores either pending or injected exception state.
(that state is first read with KVM_GET_VCPU_EVENTS ioctl).
Also note just for reference that KVM_SET_SREGS has ability
to set a pending interrupt, something that is also redundant
since KVM_SET_VCPU_EVENTS does this.
I recently added KVM_SET_SREGS2 ioctl which now lacks this
ability.
KVM_SET_VCPU_EVENTS/KVM_GET_VCPU_EVENTS allows to read/write
state of both pending and injected exception and on top of that
allows to read/write the exception payload of a pending exception.
You should consider using it to preserve/modify the exception state,
although the later isn't recommended (qemu does this in few places,
but it is a bit buggy, especially in regard to nested guests).
I need neither to preserve nor modify
the exception state. All I need is to safely
call KVM_SET_REGS/KVM_SET_SREGS, but
that appears unsafe when exception is
pending.
Please take a look into this commit:
https://www.lkml.org/lkml/2020/12/1/324
It was suggested that the removal
of "!kvm_event_needs_reinjection(vcpu)"
condition from kvm_vcpu_ready_for_interrupt_injection()
is what introduced the problem, as
right now ready_for_interrupt_injection
doesn't account for pending exception.
I already added the check to my
user-space code, and now it works
reliably on some very old kernels
prior to the aforementioned patch.
So should we treat that as a regressions,
or any other proposal?