On Wed, Oct 16, 2019 at 11:29:00AM +0200, Thomas Gleixner wrote: > > - Modify the #AC handler to test/set the same atomic variable as the > > sysfs knob. This is the "disabled by kernel" flow. > > That's the #AC in kernel handler, right? Yes. > > - Modify the debugfs/sysfs knob to only allow disabling split-lock > > detection. This is the "disabled globally" path, i.e. sends IPIs to > > clear MSR_TEST_CTRL.split_lock on all online CPUs. > > Why only disable? What's wrong with reenabling it? The shiny new driver you > are working on is triggering #AC. So in order to test the fix, you need to > reboot the machine instead of just unloading the module, reenabling #AC and > then loading the fixed one? A re-enabling path adds complexity (though not much) and is undesirable for a production environment as a split-lock issue in the kernel isn't going to magically disappear. And I thought that disable-only was also your preferred implementation based on a previous comment[*], but that comment may have been purely in the scope of userspace applications. Anyways, my personal preference would be to keep things simple and not support a re-enabling path. But then again, I do 99.9% of my development in VMs so my vote probably shouldn't count regarding the module issue. [*] https://lkml.kernel.org/r/alpine.DEB.2.21.1904180832290.3174@xxxxxxxxxxxxxxxxxxxxxxx > > - Modify the resume/init flow to clear MSR_TEST_CTRL.split_lock if it's > > been disabled on *any* CPU via #AC or via the knob. > > Fine. > > > - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's > > actual MSR_TEST_CTRL. KVM still emulates MSR_TEST_CTRL so that the > > guest can do WRMSR and handle its own #AC faults, but KVM doesn't > > change the value in hardware. > > > > * Allowing guest to enable split-lock detection can induce #AC on > > the host after it has been explicitly turned off, e.g. the sibling > > hyperthread hits an #AC in the host kernel, or worse, causes a > > different process in the host to SIGBUS. > > > > * Allowing guest to disable split-lock detection opens up the host > > to DoS attacks. > > Wasn't this discussed before and agreed on that if the host has AC enabled > that the guest should not be able to force disable it? I surely lost track > of this completely so my memory might trick me. Yes, I was restating that point, or at least attempting to. > The real question is what you do when the host has #AC enabled and the > guest 'disabled' it and triggers #AC. Is that going to be silently ignored > or is the intention to kill the guest in the same way as we kill userspace? > > The latter would be the right thing, but given the fact that the current > kernels easily trigger #AC today, that would cause a major wreckage in > hosting scenarios. So I fear we need to bite the bullet and have a knob > which defaults to 'handle silently' and allows to enable the kill mechanics > on purpose. 'Handle silently' needs some logging of course, at least a per > guest counter which can be queried and a tracepoint.