On Thu, Jul 30, 2020 at 4:08 PM Alexander Graf <graf@xxxxxxxxxx> wrote: > > > > On 31.07.20 00:42, Jim Mattson wrote: > > > > On Wed, Jul 29, 2020 at 4:59 PM Alexander Graf <graf@xxxxxxxxxx> wrote: > >> > >> MSRs are weird. Some of them are normal control registers, such as EFER. > >> Some however are registers that really are model specific, not very > >> interesting to virtualization workloads, and not performance critical. > >> Others again are really just windows into package configuration. > >> > >> Out of these MSRs, only the first category is necessary to implement in > >> kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against > >> certain CPU models and MSRs that contain information on the package level > >> are much better suited for user space to process. However, over time we have > >> accumulated a lot of MSRs that are not the first category, but still handled > >> by in-kernel KVM code. > >> > >> This patch adds a generic interface to handle WRMSR and RDMSR from user > >> space. With this, any future MSR that is part of the latter categories can > >> be handled in user space. > >> > >> Furthermore, it allows us to replace the existing "ignore_msrs" logic with > >> something that applies per-VM rather than on the full system. That way you > >> can run productive VMs in parallel to experimental ones where you don't care > >> about proper MSR handling. > >> > >> Signed-off-by: Alexander Graf <graf@xxxxxxxxxx> > > > > Can we just drop em_wrmsr and em_rdmsr? The in-kernel emulator is > > already incomplete, and I don't think there is ever a good reason for > > kvm to emulate RDMSR or WRMSR if the VM-exit was for some other reason > > (and we shouldn't end up here if the VM-exit was for RDMSR or WRMSR). > > Am I missing something? > > On certain combinations of CPUs and guest modes, such as real mode on > pre-Nehalem(?) at least, we are running all guest code through the > emulator and thus may encounter a RDMSR or WRMSR instruction. I *think* > we also do so for big real mode on more modern CPUs, but I'm not 100% sure. Oh, gag me with a spoon! (BTW, we shouldn't have to emulate big real mode if the CPU supports unrestricted guest mode. If we do, something is probably wrong.) > > You seem to be assuming that the instruction at CS:IP will still be > > RDMSR (or WRMSR) after returning from userspace, and we will come > > through kvm_{get,set}_msr_user_space again at the next KVM_RUN. That > > isn't necessarily the case, for a variety of reasons. I think the > > Do you have a particular situation in mind where that would not be the > case and where we would still want to actually complete an MSR operation > after the environment changed? As far as userspace is concerned, if it has replied with error=0, the instruction has completed and retired. If the kernel executes a different instruction at CS:RIP, the state is certainly inconsistent for WRMSR exits. It would also be inconsistent for RDMSR exits if the RDMSR emulation on the userspace side had any side-effects. > > 'completion' of the userspace instruction emulation should be done > > with the complete_userspace_io [sic] mechanism instead. > > Hm, that would avoid a roundtrip into guest mode, but add a cycle > through the in-kernel emulator. I'm not sure that's a net win quite yet. > > > > > I'd really like to see this mechanism apply only in the case of > > invalid/unknown MSRs, and not for illegal reads/writes as well. > > Why? Any #GP inducing MSR access will be on the slow path. What's the > problem if you get a few more of them in user space that you just bounce > back as failing, so they actually do inject a fault? I'm not concerned about the performance. I think I'm just biased because of what we have today. But since we're planning on dropping that anyway, I take it back. IIRC, the plumbing to make the distinction is a little painful, and I don't want to ask you to go there.