On 2020-04-08 7:12 a.m., Thomas Gleixner wrote:
Ankur Arora <ankur.a.arora@xxxxxxxxxx> writes:
A KVM host (or another hypervisor) might advertise paravirtualized
features and optimization hints (ex KVM_HINTS_REALTIME) which might
become stale over the lifetime of the guest. For instance, the
host might go from being undersubscribed to being oversubscribed
(or the other way round) and it would make sense for the guest
switch pv-ops based on that.
If your host changes his advertised behaviour then you want to fix the
host setup or find a competent admin.
This lockorture splat that I saw on the guest while testing this is
indicative of the problem:
[ 1136.461522] watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [lock_torture_wr:12865]
[ 1136.461542] CPU: 8 PID: 12865 Comm: lock_torture_wr Tainted: G W L 5.4.0-rc7+ #77
[ 1136.461546] RIP: 0010:native_queued_spin_lock_slowpath+0x15/0x220
(Caused by an oversubscribed host but using mismatched native pv_lock_ops
on the gues.)
And this illustrates what? The fact that you used a misconfigured setup.
This series addresses the problem by doing paravirt switching at
runtime.
You're not addressing the problem. Your fixing the symptom, which is
wrong to begin with.
The alternative use-case is a runtime version of apply_alternatives()
(not posted with this patch-set) that can be used for some safe subset
of X86_FEATUREs. This could be useful in conjunction with the ongoing
late microcode loading work that Mihai Carabas and others have been
working on.
This has been discussed to death before and there is no safe subset as
long as this hasn't been resolved:
https://lore.kernel.org/lkml/alpine.DEB.2.21.1909062237580.1902@xxxxxxxxxxxxxxxxxxxxxxx/
Thanks. I was thinking of fairly limited subset: ex re-evaluate
X86_FEATURE_ALWAYS to make sure static_cpu_has() reflects reality
but I guess that has second order effects here.
Ankur
Thanks,
tglx