Re: "KVM: x86/mmu: Overhaul TDP MMU zapping and flushing" breaks SVM on Hyper-V

Jeremi Piotrowski <jpiotrowski@xxxxxxxxxxxxxxxxxxx> · Mon, 13 Feb 2023 19:05:12 +0100

On 13/02/2023 13:50, Paolo Bonzini wrote:
> 
> On 2/13/23 13:44, Jeremi Piotrowski wrote:
>> Just built a kernel from that tree, and it displays the same behavior. The problem
>> is not that the addresses are wrong, but that the flushes are issued at the wrong
>> time now. At least for what "enlightened NPT TLB flush" requires.
> 
> It is not clear to me why HvCallFluyshGuestPhysicalAddressSpace or HvCallFlushGuestPhysicalAddressList would have stricter requirements than a "regular" TLB shootdown using INVEPT.
> 
> Can you clarify what you mean by wrong time, preferrably with some kind of sequence of events?
> 
> That is, something like
> 
> CPU 0    Modify EPT from ... to ...
> CPU 0    call_rcu() to free page table
> CPU 1    ... which is invalid because ...
> CPU 0    HvCallFlushGuestPhysicalAddressSpace
> 
> Paolo

So I looked at the ftrace (all kvm&kvmmu events + hyperv_nested_* events) I see the following:
With tdp_mmu=0:
kvm_exit
sequence of kvm_mmu_prepare_zap_page
hyperv_nested_flush_guest_mapping (always follows every sequence of kvm_mmu_prepare_zap_page)
kvm_entry

With tdp_mmu=1 I see:
kvm_mmu_prepare_zap_page and kvm_tdp_mmu_spte_changed events from a kworker context, but
they are not followed by hyperv_nested_flush_guest_mapping. The only hyperv_nested_flush_guest_mapping
events I see happen from the qemu process context.

Also the number of flush hypercalls is significantly lower: a 7second sequence through OVMF with
tdp_mmu=0 produces ~270 flush hypercalls. In the traces with tdp_mmu=1 I now see max 3.

So this might be easier to diagnose than I thought: the HvCallFlushGuestPhysicalAddressSpace calls
are missing now.