Il 08/10/2013 07:38, Kashyap Chamarthy ha scritto: > On Mon, Oct 7, 2013 at 6:29 PM, Kashyap Chamarthy <kashyap.cv@xxxxxxxxx> wrote: >> Gleb, so I just did a trace of KVM MMU to try to understand why L2 is >> stuck with shadow on EPT > > Paolo, were you able to reproduce this again? Yesterday, on #qemu you > mentioned you'll test it again :-) Yes, I could reproduce it too. >> Boot L2 guest: Here L2 doesn't go past the second instruction. It gets a page fault even though the spte is present, and KVM then loops on a page fault for 0xfe05b. Here is an annotated function_graph trace of L1. It's possible that L0 is injecting the same fault repeatedly, i.e. they are not different faults from the processor. I'll get an L0 trace next. ---- KVM executes at 0xfffffff0 via emulation kvm_cpu_has_pending_timer [kvm]() { apic_has_pending_timer [kvm](); } kvm_check_async_pf_completion [kvm](); vmx_save_host_state [kvm_intel](); __srcu_read_unlock(); rcu_note_context_switch(); /* kvm_entry: vcpu 0 */ vmx_vcpu_run [kvm_intel](); vmx_read_l1_tsc [kvm_intel](); vmx_handle_external_intr [kvm_intel]() { } __srcu_read_lock(); vmx_handle_exit [kvm_intel]() { guest_state_valid.part.27 [kvm_intel]() { rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } } vmx_interrupt_allowed [kvm_intel](); x86_emulate_instruction [kvm]() { init_emulate_ctxt [kvm]() { vmx_get_cs_db_l_bits [kvm_intel]() { vmx_read_guest_seg_ar [kvm_intel](); } vmx_get_rflags [kvm_intel](); } x86_decode_insn [kvm]() { do_insn_fetch [kvm]() { ... kvm_fetch_guest_virt [kvm]() { vmx_get_cpl [kvm_intel](); kvm_read_guest_virt_helper [kvm]() { nonpaging_gva_to_gpa [kvm](); kvm_read_guest [kvm]() { kvm_read_guest_page [kvm]() { gfn_to_hva_prot [kvm]() { __gfn_to_hva_many [kvm](); } } } } /* kvm_read_guest_virt_helper [kvm] */ } } ... } x86_emulate_insn [kvm]() { /* kvm_emulate_insn: ffff0000:fff0:ea 5b e0 00 f0 (real) */ em_jmp_far [kvm]() { load_segment_descriptor [kvm]() { emulator_get_segment [kvm]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } } emulator_set_segment [kvm]() { vmx_set_segment [kvm_intel]() { fix_rmode_seg [kvm_intel]() { vmcs_writel [kvm_intel](); vmcs_writel [kvm_intel](); vmcs_writel [kvm_intel](); vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); vmcs_writel [kvm_intel](); } emulation_required [kvm_intel]() { guest_state_valid.part.27 [kvm_intel]() { rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } /* rmode_segment_valid [kvm_intel] */ rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } } } } } } } writeback [kvm](); writeback_registers [kvm](); } vmx_get_interrupt_shadow [kvm_intel](); vmx_set_interrupt_shadow [kvm_intel](); vmx_get_rflags [kvm_intel](); kvm_set_rflags [kvm]() { vmx_set_rflags [kvm_intel]() { vmcs_writel [kvm_intel](); } } } guest_state_valid.part.27 [kvm_intel]() { rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } } emulation_required [kvm_intel]() { guest_state_valid.part.27 [kvm_intel]() { rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } rmode_segment_valid [kvm_intel]() { vmx_get_segment [kvm_intel]() { vmx_read_guest_seg_selector [kvm_intel](); } vmx_segment_access_rights.isra.25.part.26 [kvm_intel](); } } } } ---- KVM executes at f000:e05b and gets a page fault kvm_cpu_has_pending_timer [kvm]() { apic_has_pending_timer [kvm](); } kvm_check_async_pf_completion [kvm](); kvm_apic_accept_events [kvm](); kvm_cpu_has_injectable_intr [kvm]() { kvm_apic_accept_pic_intr [kvm](); } vmx_interrupt_allowed [kvm_intel](); kvm_cpu_has_injectable_intr [kvm]() { kvm_apic_accept_pic_intr [kvm](); } enable_irq_window [kvm_intel]() { vmcs_writel [kvm_intel](); } vmx_save_host_state [kvm_intel](); __srcu_read_unlock(); rcu_note_context_switch(); /* kvm_entry: vcpu 0 */ vmx_vcpu_run [kvm_intel]() { vmcs_writel [kvm_intel](); vmcs_writel [kvm_intel](); perf_guest_get_msrs(); /* kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e */ } vmx_read_l1_tsc [kvm_intel](); vmx_handle_external_intr [kvm_intel](); __srcu_read_lock(); vmx_handle_exit [kvm_intel]() { handle_exception [kvm_intel]() { /* kvm_page_fault: address fe05b error_code 14 */ kvm_mmu_page_fault [kvm]() { nonpaging_page_fault [kvm]() { mmu_topup_memory_caches [kvm](); gfn_to_memslot_dirty_bitmap.isra.67 [kvm]() { gfn_to_memslot [kvm](); } mapping_level.isra.86 [kvm]() { kvm_host_page_size [kvm]() { gfn_to_hva [kvm]() { __gfn_to_hva_many [kvm](); } down_read() { _cond_resched(); } find_vma(); vma_kernel_pagesize(); up_read(); } } try_async_pf [kvm]() { gfn_to_pfn_async [kvm]() { __gfn_to_pfn [kvm]() { __gfn_to_pfn_memslot [kvm]() { __gfn_to_hva_many [kvm](); __get_user_pages_fast() { gup_pud_range() { gup_pte_range(); } } } } } } handle_abnormal_pfn [kvm](); _raw_spin_lock(); make_mmu_pages_available.isra.80 [kvm](); transparent_hugepage_adjust.isra.91 [kvm]() { kvm_is_mmio_pfn [kvm](); } __direct_map.isra.104 [kvm]() { shadow_walk_init [kvm](); kvm_mmu_get_page [kvm]() { pte_list_add [kvm](); /* kvm_mmu_get_page: sp gen 2 gfn 0 1 q0 direct wux !nxe root 0 sync new */ } link_shadow_page.isra.63 [kvm]() { mmu_spte_set [kvm](); } mmu_set_spte [kvm]() { set_spte [kvm]() { mark_page_dirty [kvm](); mmu_spte_update [kvm]() { mmu_spte_set [kvm](); } } kvm_mmu_page_get_gfn [kvm](); gfn_to_rmap [kvm]() { gfn_to_memslot [kvm](); } pte_list_add [kvm](); kvm_release_pfn_clean [kvm]() { kvm_is_mmio_pfn [kvm](); put_page(); } } __direct_pte_prefetch [kvm]() { kvm_mmu_page_get_gfn [kvm](); gfn_to_memslot_dirty_bitmap.isra.67 [kvm]() { gfn_to_memslot [kvm](); } gfn_to_page_many_atomic [kvm]() { __gfn_to_hva_many [kvm](); __get_user_pages_fast() { gup_pud_range() { gup_pte_range(); } } } mmu_set_spte [kvm]() { set_spte [kvm]() { mark_page_dirty [kvm](); mmu_spte_update [kvm]() { mmu_spte_set [kvm](); } } kvm_mmu_page_get_gfn [kvm](); gfn_to_rmap [kvm]() { gfn_to_memslot [kvm](); } pte_list_add [kvm](); kvm_release_pfn_clean [kvm]() { kvm_is_mmio_pfn [kvm](); put_page(); } } mmu_set_spte [kvm]() { set_spte [kvm]() { mark_page_dirty [kvm](); mmu_spte_update [kvm]() { mmu_spte_set [kvm](); } } kvm_mmu_page_get_gfn [kvm](); gfn_to_rmap [kvm]() { gfn_to_memslot [kvm](); } pte_list_add [kvm](); kvm_release_pfn_clean [kvm]() { kvm_is_mmio_pfn [kvm](); put_page(); } } mmu_set_spte [kvm]() { set_spte [kvm]() { mark_page_dirty [kvm](); mmu_spte_update [kvm]() { mmu_spte_set [kvm](); } } kvm_mmu_page_get_gfn [kvm](); gfn_to_rmap [kvm]() { gfn_to_memslot [kvm](); } pte_list_add [kvm](); kvm_release_pfn_clean [kvm]() { kvm_is_mmio_pfn [kvm](); put_page(); } } mmu_set_spte [kvm]() { set_spte [kvm]() { mark_page_dirty [kvm](); mmu_spte_update [kvm]() { mmu_spte_set [kvm](); } } kvm_mmu_page_get_gfn [kvm](); gfn_to_rmap [kvm]() { gfn_to_memslot [kvm](); } pte_list_add [kvm](); kvm_release_pfn_clean [kvm]() { kvm_is_mmio_pfn [kvm](); put_page(); } } mmu_set_spte [kvm]() { set_spte [kvm]() { mark_page_dirty [kvm](); mmu_spte_update [kvm]() { mmu_spte_set [kvm](); } } kvm_mmu_page_get_gfn [kvm](); gfn_to_rmap [kvm]() { gfn_to_memslot [kvm](); } pte_list_add [kvm](); kvm_release_pfn_clean [kvm]() { kvm_is_mmio_pfn [kvm](); put_page(); } } mmu_set_spte [kvm]() { set_spte [kvm]() { mark_page_dirty [kvm](); mmu_spte_update [kvm]() { mmu_spte_set [kvm](); } } kvm_mmu_page_get_gfn [kvm](); gfn_to_rmap [kvm]() { gfn_to_memslot [kvm](); } pte_list_add [kvm](); kvm_release_pfn_clean [kvm]() { kvm_is_mmio_pfn [kvm](); put_page(); } } } } } } } } ---- up to this point the trace is identical for shadow-on-shadow (working) ---- and shadow-on-EPT (not working) ---- ---- for shadow-on-EPT L1 gets another page fault even though the spte is ---- present. The error code is identical. kvm_cpu_has_pending_timer [kvm]() { apic_has_pending_timer [kvm](); } kvm_check_async_pf_completion [kvm](); vmx_save_host_state [kvm_intel](); __srcu_read_unlock(); rcu_note_context_switch(); /* kvm_entry: vcpu 0 */ vmx_vcpu_run [kvm_intel]() { vmcs_writel [kvm_intel](); vmcs_writel [kvm_intel](); perf_guest_get_msrs(); /* kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e */ } vmx_read_l1_tsc [kvm_intel](); vmx_handle_external_intr [kvm_intel](); __srcu_read_lock(); vmx_handle_exit [kvm_intel]() { handle_exception [kvm_intel]() { /* kvm_page_fault: address fe05b error_code 14 */ kvm_mmu_page_fault [kvm]() { nonpaging_page_fault [kvm]() { mmu_topup_memory_caches [kvm](); gfn_to_memslot_dirty_bitmap.isra.67 [kvm]() { gfn_to_memslot [kvm](); } mapping_level.isra.86 [kvm]() { kvm_host_page_size [kvm]() { gfn_to_hva [kvm]() { __gfn_to_hva_many [kvm](); } down_read() { _cond_resched(); } find_vma(); vma_kernel_pagesize(); up_read(); } } try_async_pf [kvm]() { gfn_to_pfn_async [kvm]() { __gfn_to_pfn [kvm]() { __gfn_to_pfn_memslot [kvm]() { __gfn_to_hva_many [kvm](); __get_user_pages_fast() { gup_pud_range() { gup_pte_range(); } } } } } } handle_abnormal_pfn [kvm](); _raw_spin_lock(); make_mmu_pages_available.isra.80 [kvm](); transparent_hugepage_adjust.isra.91 [kvm]() { kvm_is_mmio_pfn [kvm](); } __direct_map.isra.104 [kvm]() { shadow_walk_init [kvm](); >>>>>> here starts the difference with the second part, >>>>>> this shows that the spte is present mmu_set_spte [kvm]() { set_spte [kvm]() { mark_page_dirty [kvm](); mmu_spte_update [kvm]() { spte_has_volatile_bits [kvm](); } } kvm_release_pfn_clean [kvm]() { kvm_is_mmio_pfn [kvm](); put_page(); } } __direct_pte_prefetch [kvm](); } } } } } ---- this part of the trace then loops endlessly Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html