>>> On 11.04.18 at 09:08, <jgross@xxxxxxxx> wrote: > On 14/03/18 09:48, Jan Beulich wrote: >>>>> On 26.02.18 at 15:08, <jgross@xxxxxxxx> wrote: >>> @@ -35,6 +40,9 @@ void xen_arch_post_suspend(int cancelled) >>> >>> static void xen_vcpu_notify_restore(void *data) >>> { >>> + if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL)) >>> + wrmsrl(MSR_IA32_SPEC_CTRL, this_cpu_read(spec_ctrl)); >>> + >>> /* Boot processor notified via generic timekeeping_resume() */ >>> if (smp_processor_id() == 0) >>> return; >>> @@ -44,7 +52,15 @@ static void xen_vcpu_notify_restore(void *data) >>> >>> static void xen_vcpu_notify_suspend(void *data) >>> { >>> + u64 tmp; >>> + >>> tick_suspend_local(); >>> + >>> + if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL)) { >>> + rdmsrl(MSR_IA32_SPEC_CTRL, tmp); >>> + this_cpu_write(spec_ctrl, tmp); >>> + wrmsrl(MSR_IA32_SPEC_CTRL, 0); >>> + } >>> } >> >> While investigating ways how to do something similar on our old, >> non-pvops kernels I've started wondering if this solution is actually >> correct in all cases. Of course discussing this is complicated by the >> fact that the change there might be a conflict with hasn't landed >> in Linus'es tree yet (see e.g. >> https://patchwork.kernel.org/patch/10153843/ for an upstream >> submission; I haven't been able to find any discussion on that >> patch or why it isn't upstream yet), but we have it in our various >> branches. The potential problem I'm seeing is with the clearing >> and re-setting of SPEC_CTRL around CPUs going idle. While the >> active CPU could have preemption disabled (if that isn't the case >> already), the passive CPUs are - afaict - neither under full control >> of drivers/xen/manage.c:do_suspend() nor excluded yet from >> any further scheduling activity. Hence with code like this (taken >> from one of our branches) >> >> static void mwait_idle(void) >> { >> if (!current_set_polling_and_test()) { >> trace_cpu_idle_rcuidle(1, smp_processor_id()); >> if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) { >> smp_mb(); /* quirk */ >> clflush((void *)¤t_thread_info()->flags); >> smp_mb(); /* quirk */ >> } >> >> x86_disable_ibrs(); >> >> __monitor((void *)¤t_thread_info()->flags, 0, 0); >> if (!need_resched()) >> __sti_mwait(0, 0); >> else >> local_irq_enable(); >> >> x86_enable_ibrs(); >> ... >> >> the MSR might get set to non-zero again after having been >> cleared by the code your patch adds. I therefore think that the >> only race free solution would be to do the clearing from >> stop-machine context. But maybe I'm overlooking something. > > Currently and with the above mentioned patch there is no problem: Xen pv > guests always use default_idle(), so mwait_idle() eventually playing > with MSR_IA32_SPEC_CTRL won't affect us. It's pretty unclear to me why default_idle() doesn't have this - in Xen we do it for all idle routines. > In order to ensure that won't change in future default_idle() should > never modify MSR_IA32_SPEC_CTRL. In case something like that would be > required we should rather add another idle function doing that. This looks like a pretty strange/non-obvious requirement, which I don't think anyone would remember. Additionally, x86 maintainers: is there a particular reason this (or any functionally equivalent patch) isn't upstream yet? As indicated before, I had not been able to find any discussion, and hence I see no reason why this is a patch we effectively carry privately in our distro branches (and likely other distros do so too). Jan