----- On Aug 20, 2021, at 6:49 PM, Sean Christopherson seanjc@xxxxxxxxxx wrote: > Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to > transferring to a KVM guest, which is roughly equivalent to an exit to > userspace and processes many of the same pending actions. While the task > cannot be in an rseq critical section as the KVM path is reachable only > by via ioctl(KVM_RUN), the side effects that apply to rseq outside of a > critical section still apply, e.g. the current CPU needs to be updated if > the task is migrated. > > Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults > and other badness in userspace VMMs that use rseq in combination with KVM, > e.g. due to the CPU ID being stale after task migration. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function") > Reported-by: Peter Foley <pefoley@xxxxxxxxxx> > Bisected-by: Doug Evans <dje@xxxxxxxxxx> > Cc: Shakeel Butt <shakeelb@xxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > kernel/entry/kvm.c | 4 +++- > kernel/rseq.c | 14 +++++++++++--- > 2 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c > index 49972ee99aff..049fd06b4c3d 100644 > --- a/kernel/entry/kvm.c > +++ b/kernel/entry/kvm.c > @@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, > unsigned long ti_work) > if (ti_work & _TIF_NEED_RESCHED) > schedule(); > > - if (ti_work & _TIF_NOTIFY_RESUME) > + if (ti_work & _TIF_NOTIFY_RESUME) { > tracehook_notify_resume(NULL); > + rseq_handle_notify_resume(NULL, NULL); > + } > > ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work); > if (ret) > diff --git a/kernel/rseq.c b/kernel/rseq.c > index 35f7bd0fced0..6d45ac3dae7f 100644 > --- a/kernel/rseq.c > +++ b/kernel/rseq.c > @@ -282,9 +282,17 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, > struct pt_regs *regs) > > if (unlikely(t->flags & PF_EXITING)) > return; > - ret = rseq_ip_fixup(regs); > - if (unlikely(ret < 0)) > - goto error; > + > + /* > + * regs is NULL if and only if the caller is in a syscall path. Skip > + * fixup and leave rseq_cs as is so that rseq_sycall() will detect and > + * kill a misbehaving userspace on debug kernels. > + */ > + if (regs) { > + ret = rseq_ip_fixup(regs); > + if (unlikely(ret < 0)) > + goto error; > + } > if (unlikely(rseq_update_cpu_id(t))) > goto error; > return; > -- > 2.33.0.rc2.250.ged5fa647cd-goog -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com