On 8/2/17 5:42 AM, Paolo Bonzini wrote: > On 01/08/2017 15:36, Brijesh Singh wrote: >>> The flow is: >>> >>> hardware walks page table; L2 page table points to read only memory >>> -> pf_interception (code = >>> -> kvm_handle_page_fault (need_unprotect = false) >>> -> kvm_mmu_page_fault >>> -> paging64_page_fault (for example) >>> -> try_async_pf >>> map_writable set to false >>> -> paging64_fetch(write_fault = true, map_writable = false, >>> prefault = false) >>> -> mmu_set_spte(speculative = false, host_writable = false, >>> write_fault = true) >>> -> set_spte >>> mmu_need_write_protect returns true >>> return true >>> write_fault == true -> set emulate = true >>> return true >>> return true >>> return true >>> emulate >>> >>> Without this patch, emulation would have called >>> >>> ..._gva_to_gpa_nested >>> -> translate_nested_gpa >>> -> paging64_gva_to_gpa >>> -> paging64_walk_addr >>> -> paging64_walk_addr_generic >>> set fault (nested_page_fault=true) >>> >>> and then: >>> >>> kvm_propagate_fault >>> -> nested_svm_inject_npf_exit >>> >> maybe then safer thing would be to qualify the new error_code check with >> !mmu_is_nested(vcpu) or something like that. So that way it would run on >> L1 guest, and not the L2 guest. I believe that would restrict it avoid >> hitting this case. Are you okay with this change ? > Or check "vcpu->arch.mmu.direct_map"? That would be true when not using > shadow pages. Yes that can be used. >> IIRC, the main place where this check was valuable was when L1 guest had >> a fault (when coming out of the L2 guest) and emulation was not needed. > How do I measure the effect? I tried counting the number of emulations, > and any difference from the patch was lost in noise. I think this patch is necessary for functional reasons (not just perf), because we added the other patch to look at the GPA and stop walking the guest page tables on a NPF. The issue I think was that hardware has taken an NPF because the page table is marked RO, and it saves the GPA in the VMCB. KVM was then going and emulating the instruction and it saw that a GPA was available. But that GPA was not the GPA of the instruction it is emulating, since it was the GPA of the tablewalk page that had the fault. It was debugged that at the time and realized that emulating the instruction was unnecessary so we added this new code in there which fixed the functional issue and helps perf. I don't have any data on how much perf, as I recall it was most effective when the L1 guest page tables and L2 nested page tables were exactly the same. In that case, it avoided emulations for code that L1 executes which I think could be as much as one emulation per 4kb code page.