On Mon, 7 May 2018 13:35:58 +1000 Paul Mackerras <paulus@xxxxxxxxxx> wrote: > On Mon, May 07, 2018 at 12:53:42PM +1000, Nicholas Piggin wrote: > > On Mon, 7 May 2018 12:00:58 +1000 > > Paul Mackerras <paulus@xxxxxxxxxx> wrote: > > > > > On Sun, May 06, 2018 at 05:37:25PM +1000, Nicholas Piggin wrote: > > > > The radix fault handler will unmap a page table if it wants to install > > > > a huge page there instead. It will just clear that page table entry, > > > > free the page, and flush the pwc. > > > > > > This is a pretty inadequate commit description. You are describing > > > what the code does before your patch, but you don't say why that is > > > bad or what your patch does and why that is better. > > > > > > In particular, I don't see how we could ever get the situation where > > > we are replacing a page table page with a huge page and there are any > > > valid leaf PTEs in (or below) that page table page. I accept that I > > > am leaking the page table pages but I don't see why tlbies would be > > > necessary. > > > > > > What scenario did you have in mind where we would need to do the full > > > unmap, that is, where would would find any valid leaf PTEs in a page > > > table page that we are replacing? > > > > I actually wasn't 100% sure if a fault could ever try to install a > > 64k pte when another could install a 2MB (without an interleaving > > invalidation). > > I don't see how that could happen. A THP collapse should invalidate > the range before installing the 2M PTE. The code in > kvmppc_book3s_radix_page_fault() wouldn't make a different decision > for two subpages of the same 2M page. Okay I'll change the patch. > That said, that code should be > including the memslot address limits in deciding whether to use a > large/huge page or not. > > > If that's not required I can take it out of this patch, I'd like to > > add it back in a subsequent change to do proper range invalidates > > that don't leave these stale page tables around for the fault path > > to clean up, but that discussion can be left until then. > > If you're using the kvmppc_unmap_pte helper to do a range invalidate, > it would be nice to avoid doing the gfn_to_memslot each time. That seems like it should be possible, I'll see if I can make it work. Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html