> On Dec 6, 2018, at 9:36 AM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > >> On 06/12/2018 17:10, David Woodhouse wrote: >> On Wed, 2018-11-28 at 08:44 -0800, Andy Lutomirski wrote: >>>> Can we assume it's always from kernel? The Xen code definitely seems to >>>> handle invoking this from both kernel and userspace contexts. >>> I learned that my comment here was wrong shortly after the patch landed :( >> Turns out the only place I see it getting called from is under >> __context_switch(). >> >> #7 [ffff8801144a7cf0] new_xen_failsafe_callback at ffffffffa028028a [kmod_ebxfix] >> #8 [ffff8801144a7d90] xen_hypercall_update_descriptor at ffffffff8100114a >> #9 [ffff8801144a7db8] xen_hypercall_update_descriptor at ffffffff8100114a >> #10 [ffff8801144a7df0] xen_mc_flush at ffffffff81006ab9 >> #11 [ffff8801144a7e30] xen_end_context_switch at ffffffff81004e12 >> #12 [ffff8801144a7e48] __switch_to at ffffffff81016582 >> #13 [ffff8801144a7ea0] __schedule at ffffffff815d2b37 >> >> That …114a in xen_hypercall_update_descriptor is the 'pop' instruction >> right after the syscall; it's happening when Xen is preempting the >> domain in the hypercall and then reloads the segment registers to run >> that vCPU again later. >> >> [ 44185.225289] WARN: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000abbd76060 >> >> The update_descriptor hypercall args (rdi, rsi) were 0xabbd76060 and 0 >> respectively — it was setting a descriptor at that address, to zero. >> >> Xen then failed to load the selector 0x63 into the %gs register (since >> that descriptor has just been wiped?), leaving it zero. >> >> [ 44185.225256] WARN: xen_failsafe_callback from xen_hypercall_update_descriptor+0xa/0x40 >> [ 44185.225263] WARN: DS: 2b/2b ES: 2b/2b FS: 0/0 GS:0/63 >> >> This is on context switch from a 32-bit task to idle. So >> xen_failsafe_callback is returning to the "faulting" instruction, with >> a comment saying "Retry the IRET", but in fact is just continuing on >> its merry way with %gs unexpectedly set to zero. >> >> In fact I think this is probably fine in practice, since it's about to >> get explicitly set a few lines further down in __context_switch(). But >> it's odd enough, and far enough away from what's actually said by the >> comments, that I'm utterly unsure. >> >> In xen_load_tls() we explicitly only do the lazy_load_gs(0) for the >> 32-bit kernel. Is that really right? > > Basically - what is happening is that xen_load_tls() is invalidating the > %gs selector while %gs is still non-NUL. > > If this happens to intersect with a vcpu reschedule, %gs (being non-NUL) > takes precedence over KERNGSBASE, and faults when Xen tries to reload > it. This results in the failsafe callback being invoked. > > I think the correct course of action is to use xen_load_gs_index(0) > (poorly named - it is a hypercall which does swapgs; mov to %gs; swapgs) > before using update_descriptor() to invalidate the segment. > > That will reset %gs to 0 without touching KERNGSBASE, and can be queued > in the same multicall as the update_descriptor() hypercall. Sounds good to me as long as we skip it on native.