On Sat, Jul 21, 2018 at 6:20 PM, M. Vefa Bicakci <m.v.b@xxxxxxxxxx> wrote: > On 07/21/2018 07:37 PM, M. Vefa Bicakci wrote: >> >> On 07/21/2018 07:30 PM, Andy Lutomirski wrote: >>> >>> On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci <m.v.b@xxxxxxxxxx> >>> wrote: >>>> >>>> On 07/21/2018 05:45 PM, Andy Lutomirski wrote: >>>>> >>>>> >>>>> On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <m.v.b@xxxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> >>>>>> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for >>>>>> exceptions/interrupts, to reduce speculation attack surface") >>>>>> unintendedly >>>>>> broke Xen PV virtual machines by clearing the %rbx register at the end >>>>>> of >>>>>> xen_failsafe_callback before the latter jumps to error_exit. >>>>>> error_exit expects the %rbx register to be a flag indicating whether >>>>>> there should be a return to kernel mode. >>>>>> >>>>>> This commit makes sure that the %rbx register is not cleared by >>>>>> the PUSH_AND_CLEAR_REGS macro, when the macro in question is >>>>>> instantiated >>>>>> by xen_failsafe_callback, to avoid the issue. >>>>> >>>>> >>>>> >>>>> Seems like a genuine problem, but: >>>>> >>>>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S >>>>>> index c7449f377a77..96e8ff34129e 100644 >>>>>> --- a/arch/x86/entry/entry_64.S >>>>>> +++ b/arch/x86/entry/entry_64.S >>>>>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) >>>>>> addq $0x30, %rsp >>>>>> UNWIND_HINT_IRET_REGS >>>>>> pushq $-1 /* orig_ax = -1 => not a system call */ >>>>>> - PUSH_AND_CLEAR_REGS >>>>>> + PUSH_AND_CLEAR_REGS clear_rbx=0 >>>>>> ENCODE_FRAME_POINTER >>>>>> jmp error_exit >>>>> >>>>> >>>>> >>>>> The old code first set RBX to zero then, if frame pointers are on, >>>>> sets it to some special non-zero value, then crosses its fingers and >>>>> hopes for the best. Your patched code just skips the zeroing part, so >>>>> RBX either contains the ENCODE_FRAME_POINTER result or is >>>>> uninitialized. >>>>> >>>>> How about actually initializing rbx to something sensible like, say, 1? >>>> >>>> >>>> Hello Andy, >>>> >>>> Thank you for the review! Apparently, I have not done my homework fully. >>>> I will test your suggestion and report back, most likely in a few hours. >>>> >>>> I have been testing with the next/linux-next tree's master branch >>>> (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the >>>> frame pointer (i.e., RBP) register, as opposed to the RBX register, >>>> which the patch aims to avoid changing before jumping to error_exit. >>>> It is possible that I am missing something though -- I am not sure about >>>> the connection between the RBP and RBX registers. >>> >>> >>> Sorry, brain fart on my part. >> >> >> No problem! :-) >> >>>> The change introduced by commit 3ac6d8c787b8 is in the excerpt below. >>>> Would >>>> it >>>> be valid to state that the original code had the same issue that you >>>> referred >>>> to (i.e., leaving the RBX register uninitialized)? >>> >>> >>> Presumably. >>> >>> I would propose a rather different fix: >>> >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793 >>> >>> Any chance you could test that and see if it fixes your problem? >> >> >> Of course; I will report back with the result in a few hours. > > > Hello Andy, > > I confirm that the commit at [1] resolves the issue in question as well. > > To test, I first reverted my commit, applied your commit and verified that > the bug cannot be reproduced. Afterwards, I reverted your commit and > verified that the bug is reproducible. > > I am not sure about the best way to document the bug I encountered in your > commit message, but in case you plan to have your commit merged, please > feel free to add a "Reported-and-tested-by: M. Vefa Bicakci > <m.v.b@xxxxxxxxxx>" > tag to the commit message, possibly with a link to this e-mail thread. > > Finally, as I had mentioned in my commit message, this bug exists in all > kernel versions 4.14 and greater, so it would be nice if you could > carbon-copy > "stable@xxxxxxxxxxxxxxx" in the commit message as well. > > Thank you, I'm curious why the sigreturn_64 selftest didn't catch this bug. Do you happen to know why? The xen_failsafe_callback mechanism is a bit mysterious to me. > > Vefa > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793 >