Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Thu, 18 Mar 2021 13:21:01 +0100

On 16.03.21 13:34, Maxim Levitsky wrote:
> On Tue, 2021-03-16 at 12:27 +0100, Jan Kiszka wrote:
>> On 16.03.21 11:59, Maxim Levitsky wrote:
>>> On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
>>>> On 16.03.21 00:37, Sean Christopherson wrote:
>>>>> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>>>>>> This change greatly helps with two issues:
>>>>>>
>>>>>> * Resuming from a breakpoint is much more reliable.
>>>>>>
>>>>>>   When resuming execution from a breakpoint, with interrupts enabled, more often
>>>>>>   than not, KVM would inject an interrupt and make the CPU jump immediately to
>>>>>>   the interrupt handler and eventually return to the breakpoint, to trigger it
>>>>>>   again.
>>>>>>
>>>>>>   From the user point of view it looks like the CPU never executed a
>>>>>>   single instruction and in some cases that can even prevent forward progress,
>>>>>>   for example, when the breakpoint is placed by an automated script
>>>>>>   (e.g lx-symbols), which does something in response to the breakpoint and then
>>>>>>   continues the guest automatically.
>>>>>>   If the script execution takes enough time for another interrupt to arrive,
>>>>>>   the guest will be stuck on the same breakpoint RIP forever.
>>>>>>
>>>>>> * Normal single stepping is much more predictable, since it won't land the
>>>>>>   debugger into an interrupt handler, so it is much more usable.
>>>>>>
>>>>>>   (If entry to an interrupt handler is desired, the user can still place a
>>>>>>   breakpoint at it and resume the guest, which won't activate this workaround
>>>>>>   and let the gdb still stop at the interrupt handler)
>>>>>>
>>>>>> Since this change is only active when guest is debugged, it won't affect
>>>>>> KVM running normal 'production' VMs.
>>>>>>
>>>>>>
>>>>>> Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
>>>>>> Tested-by: Stefano Garzarella <sgarzare@xxxxxxxxxx>
>>>>>> ---
>>>>>>  arch/x86/kvm/x86.c | 6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>> index a9d95f90a0487..b75d990fcf12b 100644
>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>> @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>>>>>>  		can_inject = false;
>>>>>>  	}
>>>>>>  
>>>>>> +	/*
>>>>>> +	 * Don't inject interrupts while single stepping to make guest debug easier
>>>>>> +	 */
>>>>>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>>>>>> +		return;
>>>>>
>>>>> Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
>>>>> blocking at the start of single-stepping, unwind at the end?  Deviating this far
>>>>> from architectural behavior will end in tears at some point.
>>>>>
>>>>
>>>> Does this happen to address this suspicious workaround in the kernel?
>>>>
>>>>         /*
>>>>          * The kernel doesn't use TF single-step outside of:
>>>>          *
>>>>          *  - Kprobes, consumed through kprobe_debug_handler()
>>>>          *  - KGDB, consumed through notify_debug()
>>>>          *
>>>>          * So if we get here with DR_STEP set, something is wonky.
>>>>          *
>>>>          * A known way to trigger this is through QEMU's GDB stub,
>>>>          * which leaks #DB into the guest and causes IST recursion.
>>>>          */
>>>>         if (WARN_ON_ONCE(dr6 & DR_STEP))
>>>>                 regs->flags &= ~X86_EFLAGS_TF;
>>>>
>>>> (arch/x86/kernel/traps.c, exc_debug_kernel)
>>>>
>>>> I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
>>>> yeah, question to myself as well, dancing around broken guest debugging
>>>> for a long time while trying to fix other issues...
>>>
>>> To be honest I didn't see that warning even once, but I can imagine KVM
>>> leaking #DB due to bugs in that code. That area historically didn't receive
>>> much attention since it can only be triggered by
>>> KVM_GET/SET_GUEST_DEBUG which isn't used in production.
>>
>> I've triggered it recently while debugging a guest, that's why I got
>> aware of the code path. Long ago, all this used to work (soft BPs,
>> single-stepping etc.)
>>
>>> The only issue that I on the other hand  did
>>> see which is mostly gdb fault is that it fails to remove a software breakpoint
>>> when resuming over it, if that breakpoint's python handler messes up 
>>> with gdb's symbols, which is what lx-symbols does.
>>>
>>> And that despite the fact that lx-symbol doesn't mess with the object
>>> (that is the kernel) where the breakpoint is defined.
>>>
>>> Just adding/removing one symbol file is enough to trigger this issue.
>>>
>>> Since lx-symbols already works this around when it reloads all symbols,
>>> I extended that workaround to happen also when loading/unloading 
>>> only a single symbol file.
>>
>> You have no issue with interactive debugging when NOT using gdb scripts
>> / lx-symbol?
> 
> To be honest I don't use guest debugging that much,
> so I probably missed some issues.
> 
> Now that I fixed lx-symbols though I'll probably use 
> guest debugging much more.
> I will keep an eye on any issues that I find.
> 
> The main push to fix lx-symbols actually came
> from me wanting to understand if there is something
> broken with KVM's guest debugging knowing that
> lx-symbols crashes the guest when module is loaded
> after lx-symbols was executed.
> 
> That lx-symbols related guest crash I traced to issue 
> with gdb as I explained, and the lack of blocking of the interrupts 
> on single step is not a bug but more a missing feature
> that should be implemented to make single step easier to use.
> 
> Another issue which isn't a bug is that you can't place a software
> breakpoint if kernel is not loaded (since there is no code in memory)
> or if the kernel haven't done basic paging initialization 
> (since there is no paging yet to know where to place the breakpoint).
> Hardware breakpoints work for this fine though.
> 
> So in summary I haven't found any major issues with KVM's guest debug
> yet.
> 
> If I do notice issues with guest debug, I will try to isolate
> and debug them.
> For the issue that you mentioned, do you have a way to reproduce it?

To pick this up again, I did some experiments and was easily able to
reproduce the main problem:

 - checkout linux master (1df27313f50 yesterday)
 - build a fitting host kernel with KVM
 - build a target kernel with defconfig + CONFIG_KVM_GUEST +
   CONFIG_DEBUG_INFO, no gdb scripts for now
 - boot the guest

   qemu-system-x86_64 linux.img -enable-kvm -smp 4 -s
      -kernel bzImage -append "console=ttyS0 root=... nokaslr"

 - gdb vmlinux
 - tar rem :1234
 - b __x64_sys_execve
 - continue whenever you hit the breakpoint, and the guest will
   eventually hang
 - or stepi, and you may end up in the interrupt handler
 - specifically doing that on the serial console or setting the bp in
   early boot exposes the issue

I've also seen

WARNING: CPU: 3 PID: 751 at ../arch/x86/kernel/traps.c:915
exc_debug+0x16b/0x1c0

from time to time.

When I apply this patch to the host, the problems are gone.

Interestingly, my OpenSUSE 5.3.18-lp152.66-default does not show this
behavior and /seems/ stable from quick testing. Not sure if there were
changes in upstream between that baseline and head or if Suse is
carrying a local fix.

In any case, I think we need the following:

 - default disabling of event injections when guest debugging injected
   TF
 - possibly some control interface to allow events BUT then avoids any
   TF injection to prevent guest state corruptions
 - exposure of that interface to the gdb frontend, maybe by making it
   available via the QEMU monitor (monitor ...)
 - a for kvm-unit-tests to trigger the above corner cases

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux