On Wed, May 09, 2018 at 12:20:17PM +1000, Nicholas Piggin wrote: > When partition scope mappings are unmapped with kvm_unmap_radix, the > pte is cleared, but the page table structure is left in place. If the > next page fault requests a different page table geometry (e.g., due to > THP promotion or split), kvmppc_create_pte is responsible for changing > the page tables. > > When a page table entry is to be converted to a large pte, the page > table entry is cleared, the PWC flushed, then the page table it points > to freed. This will cause pte page tables to leak when a 1GB page is > to replace a pud entry points to a pmd table with pte tables under it: > The pmd table will be freed, but its pte tables will be missed. > > Fix this by replacing the simple clear and free code with one that > walks down the page tables and frees children. Care must be taken to > clear the root entry being unmapped then flushing the PWC before > freeing any page tables, as explained in comments. > > This requires PWC flush to logically become a flush-all-PWC (which it > already is in hardware, but the KVM API needs to be changed to avoid > confusion). > > This code also checks that no unexpected pte entries exist in any page > table being freed, and unmaps those and emits a WARN. This is an > expensive operation for the pte page level, but partition scope > changes are rare, so it's unconditional for now to flush out bugs. > > Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx> This will conflict with Aneesh's patch "powerpc/kvm: Switch kvm pmd allocator to custom allocator", which Michael Ellerman has put into his topic/ppc-kvm branch. Please adjust this patch to use kvmppc_pmd_alloc/free so it can go on top of Aneesh's patch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html