> On May 14, 2019, at 2:06 PM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > >> On Tue, May 14, 2019 at 01:33:21PM -0700, Andy Lutomirski wrote: >> On Tue, May 14, 2019 at 11:09 AM Sean Christopherson >> <sean.j.christopherson@xxxxxxxxx> wrote: >>> For IRQs it's somewhat feasible, but not for NMIs since NMIs are unblocked >>> on VMX immediately after VM-Exit, i.e. there's no way to prevent an NMI >>> from occuring while KVM's page tables are loaded. >>> >>> Back to Andy's question about enabling IRQs, the answer is "it depends". >>> Exits due to INTR, NMI and #MC are considered high priority and are >>> serviced before re-enabling IRQs and preemption[1]. All other exits are >>> handled after IRQs and preemption are re-enabled. >>> >>> A decent number of exit handlers are quite short, e.g. CPUID, most RDMSR >>> and WRMSR, any event-related exit, etc... But many exit handlers require >>> significantly longer flows, e.g. EPT violations (page faults) and anything >>> that requires extensive emulation, e.g. nested VMX. In short, leaving >>> IRQs disabled across all exits is not practical. >>> >>> Before going down the path of figuring out how to handle the corner cases >>> regarding kvm_mm, I think it makes sense to pinpoint exactly what exits >>> are a) in the hot path for the use case (configuration) and b) can be >>> handled fast enough that they can run with IRQs disabled. Generating that >>> list might allow us to tightly bound the contents of kvm_mm and sidestep >>> many of the corner cases, i.e. select VM-Exits are handle with IRQs >>> disabled using KVM's mm, while "slow" VM-Exits go through the full context >>> switch. >> >> I suspect that the context switch is a bit of a red herring. A >> PCID-don't-flush CR3 write is IIRC under 300 cycles. Sure, it's slow, >> but it's probably minor compared to the full cost of the vm exit. The >> pain point is kicking the sibling thread. > > Speaking of PCIDs, a separate mm for KVM would mean consuming another > ASID, which isn't good. I’m not sure we care. We have many logical address spaces (two per mm plus a few more). We have 4096 PCIDs, but we only use ten or so. And we have some undocumented number of *physical* ASIDs with some undocumented mechanism by which PCID maps to a physical ASID. I don’t suppose you know how many physical ASIDs we have? And how it interacts with the VPID stuff?