Hi Zheng, On 11/03/2019 16:31, Zheng Xiang wrote: > Hi all, > > While a page is merged into a transparent huge page, KVM will invalidate Stage-2 for > the base address of the huge page and the whole of Stage-1. > However, this just only invalidates the first page within the huge page and the other > pages are not invalidated, see bellow: > > +---------------+--------------+ > |abcde 2MB-Page | > +---------------+--------------+ > > TLB before setting new pmd: > +---------------+--------------+ > | VA | PAGESIZE | > +---------------+--------------+ > | a | 4KB | > +---------------+--------------+ > | b | 4KB | > +---------------+--------------+ > | c | 4KB | > +---------------+--------------+ > | d | 4KB | > +---------------+--------------+ > > TLB after setting new pmd: > +---------------+--------------+ > | VA | PAGESIZE | > +---------------+--------------+ > | a | 2MB | > +---------------+--------------+ > | b | 4KB | > +---------------+--------------+ > | c | 4KB | > +---------------+--------------+ > | d | 4KB | > +---------------+--------------+ > > When VM access *b* address, it will hit the TLB and result in TLB conflict aborts or other potential exceptions. That's really bad. I can only imagine two scenarios: 1) We fail to unmap a,b,c,d (and potentially another 508 PTEs), loosing the PTE table in the process, and place the PMD instead. I can't see this happening. 2) We fail to invalidate on unmap, and that slightly less bad (but still quite bad). Which of the two cases are you seeing? > For example, we need to keep tracking of the VM memory dirty pages when VM is in live migration. > KVM will set the memslot READONLY and split the huge pages. > After live migration is canceled and abort, the pages will be merged into THP. > The later access to these pages which are READONLY will cause level-3 Permission Fault until they are invalidated. > > So should we invalidate the tlb entries for all relative pages(e.g a,b,c,d), like __flush_tlb_range()? > Or we can call __kvm_tlb_flush_vmid() to invalidate all tlb entries. We should perform an invalidate on each unmap. unmap_stage2_range seems to do the right thing. __flush_tlb_range only caters for Stage1 mappings, and __kvm_tlb_flush_vmid() is too big a hammer, as it nukes TLBs for the whole VM. I'd really like to understand what you're seeing, and how to reproduce it. Do you have a minimal example I could run on my own HW? Thanks, M. -- Jazz is not dead. It just smells funny... _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm