On Thu, 2018-12-06 at 20:27 +0000, David Woodhouse wrote: > On Thu, 2018-12-06 at 10:49 -0800, Andy Lutomirski wrote: > > > On Dec 6, 2018, at 9:36 AM, Andrew Cooper < > > > andrew.cooper3@xxxxxxxxxx> wrote: > > > Basically - what is happening is that xen_load_tls() is > > > invalidating the > > > %gs selector while %gs is still non-NUL. > > > > > > If this happens to intersect with a vcpu reschedule, %gs (being > > > non-NUL) > > > takes precedence over KERNGSBASE, and faults when Xen tries to > > > reload > > > it. This results in the failsafe callback being invoked. > > > > > > I think the correct course of action is to use > > > xen_load_gs_index(0) > > > (poorly named - it is a hypercall which does swapgs; mov to %gs; > > > swapgs) > > > before using update_descriptor() to invalidate the segment. > > > > > > That will reset %gs to 0 without touching KERNGSBASE, and can be > > > queued > > > in the same multicall as the update_descriptor() hypercall. > > > > Sounds good to me as long as we skip it on native. > > Like this? > #else > + struct multicall_space mc = __xen_mc_entry(0); > + MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); > + > loadsegment(fs, 0); > #endif That seems to boot and run, at least. I'm going to experiment with sticking a SCHEDOP_yield in the batch *after* the update_descriptor requests, to see if I can trigger the original problem a bit quicker than my current method — which involves running a hundred machines for a day or two. Still wondering if the better fix is just to adjust the comments to admit that xen_failsafe_callback catches the race condition and fixes it up perfectly, by just letting the %gs selector be zero for a while?
Attachment:
smime.p7s
Description: S/MIME cryptographic signature