Re: Causing VMEXITs when kprobes are hit in the guest VM

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 11 May 2022 13:59:14 +0000

On Wed, May 11, 2022, Jim Mattson wrote:
> On Fri, May 6, 2022 at 11:31 PM Arnabjyoti Kalita
> <akalita@xxxxxxxxxxxxxxxxx> wrote:
> >
> > Dear Sean and all,
> >
> > When a VMEXIT happens of type "KVM_EXIT_DEBUG" because a hardware
> > breakpoint was triggered when an instruction was about to be executed,
> > does the instruction where the breakpoint was placed actually execute
> > before the VMEXIT happens?
> >
> > I am attempting to record the occurrence of the debug exception in
> > userspace. I do not want to do anything extra with the debug
> > exception. I have modified the kernel code (handle_exception_nmi) to
> > do something like this -
> >
> > case BP_VECTOR:
> >     /*
> >      * Update instruction length as we may reinject #BP from
> >      * user space while in guest debugging mode. Reading it for
> >      * #DB as well causes no harm, it is not used in that case.
> >      */
> >       vmx->vcpu.arch.event_exit_inst_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
> >       kvm_run->exit_reason = KVM_EXIT_DEBUG;
> >       ......
> >       kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
> >       kvm_run->debug.arch.exception = ex_no;
> >       kvm_rip_write(vcpu, rip + vmcs_read32(VM_EXIT_INSTRUCTION_LEN));
> >    <---Change : update RIP here
> >       break;
> >
> > This allows the guest to proceed after the hardware breakpoint
> > exception was triggered. However, the guest kernel keeps running into
> > page fault at arbitrary points in time. So, I'm not sure if I need to
> > handle something else too.
> >
> > I have modified the userspace code to not trigger any exception, it
> > just records the occurence of this VMEXIT and lets the guest continue.
> >
> > Is this the right approach?
> 
> Probably not. I'm not sure how kprobes work, but the tracepoint hooks
> at function entry are multi-byte nopl instructions. The int3
> instruction that raises a #BP fault is only one byte. If you advance
> past that byte, you will try to execute the remaining bytes of the
> original nopl. You want to skip past the entire nopl.

And kprobes aren't the only thing that will generate #BP, e.g. the kernel uses
INT3 for patching, userspace debuggers in the guest can insert INT3, etc...  The
correct thing to do is to re-inject the #BP back into the guest without touching
RIP.