> On Apr 29, 2022, at 3:32 PM, Borislav Petkov <bp@xxxxxxxxx> wrote: > > On Fri, Apr 29, 2022 at 05:31:16PM +0000, Jon Kohler wrote: >> Selftests IIUC, but there may be other VMMs that do funny stuff. Said >> another way, I don’t think we actively restrict user space from doing >> this as far as I know. > > "selftests", "there may be"?! > > This doesn't sound like a real-life use case to me and we don't do > changes just because. Sorry. I appreciate your direct feedback, thank you for helping here. Let’s separate the discussion into two parts. 1: Bug Fixing -> an IBPB is being unconditionally issued even when the user selects conditional. That needs to be fixed and this patch fixes that. 2: Design -> do we even want to have this IBPB here in KVM at all? In previous discussions (v1/v2 patch) with Sean, we talked about this not making a whole lot of sense in general; however, we landed on trying to not regress users who might, for whatever reason, care about this IBPB. I’ve shared a bit more detail on our use case below. I’m fine with nuking this IBPB entirely, just want to be mindful of use cases from the rest of the community that we might not normally cross in our day to day. I’m happy to take feedback on this and integrate it into a v4 patch for both of these parts, in terms of both code and correctness in the change log. > >> The paranoid aspect here is KVM is issuing an *additional* IBPB on >> top of what already happens in switch_mm(). > > Yeah, I know how that works. > >> IMHO KVM side IBPB for most use cases isn’t really necessarily but >> the general concept is that you want to protect vCPU from guest A >> from guest B, so you issue a prediction barrier on vCPU switch. >> >> *however* that protection already happens in switch_mm(), because >> guest A and B are likely to use different mm_struct, so the only point >> of having this support in KVM seems to be to “kill it with fire” for >> paranoid users who might be doing some tomfoolery that would >> somehow bypass switch_mm() protection (such as somehow >> sharing a struct). > > Yeah, no, this all sounds like something highly hypothetical or there's > a use case of which you don't want to talk about publicly. We’re an open book here, so I’m happy to share what we’re up to publicly. Our use case is 100% qemu-kvm, which is all separate processes/structs and is covered a-ok by the switch_mm() path. We noticed this bug in one of our scalability tests, which oversubscribes the host with many 2 vCPU VMs and runs a load runner that increases load one machine at a time, so that we can see the trend of response time of an app as host load increases. Given that the host is heavily oversubscribed, there is a lot of vCPU switching, and in particular switching in between vCPUs belonging to different guests, so that we hit this particular barrier in vcpu_load constantly. Taking this fix helped smoothed out that response time curve a bit. Happy to share more specific data if you’d like. > > Either way, from what I'm reading I'm not in the least convinced that > this is needed. > > -- > Regards/Gruss, > Boris. > > https://urldefense.proofpoint.com/v2/url?u=https-3A__people.kernel.org_tglx_notes-2Dabout-2Dnetiquette&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=tctUY3zgYEwUcPZ8E8v-EiXloK4PwYvUVBlR-amoRBEVZym6a2SuqyRYbNGF1_aZ&s=lQjy9h3G6eOqr2qEuAVvtX3LuDxW1kVJHdlkezCy3sU&e=