On Thu, Aug 31, 2017 at 01:01:25AM +0200, Andrea Arcangeli wrote: > On Wed, Aug 30, 2017 at 02:53:38PM -0700, Linus Torvalds wrote: > > On Wed, Aug 30, 2017 at 9:52 AM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > > > > > > I pointed out in earlier email ->invalidate_range can only be > > > implemented (as mutually exclusive alternative to > > > ->invalidate_range_start/end) by secondary MMUs that shares the very > > > same pagetables with the core linux VM of the primary MMU, and those > > > invalidate_range are already called by > > > __mmu_notifier_invalidate_range_end. > > > > I have to admit that I didn't notice that fact - that we are already > > in the situation that > > invalidate_range is called by by the rand_end() nofifier. > > > > I agree that that should simplify all the code, and means that we > > don't have to worry about the few cases that already implemented only > > the "invalidate_page()" and "invalidate_range()" cases. > > > > So I think that simplifies Jérôme's patch further - once you have put > > the range_start/end() cases around the inner loop, you can just drop > > the invalidate_page() things entirely. > > > > > So this conversion from invalidate_page to invalidate_range looks > > > superflous and the final mmu_notifier_invalidate_range_end should be > > > enough. > > > > Yes. I missed the fact that we already called range() from range_end(). > > > > That said, the double call shouldn't hurt correctness, and it's > > "closer" to old behavior for those people who only did the range/page > > ones, so I wonder if we can keep Jérôme's patch in its current state > > for 4.13. > > Yes, the double call doesn't hurt correctness. Keeping it in current > state is safer if something, so I've no objection to it other than I'd > like to optimize it further if possible, but it can be done later. > > We're already running the double call in various fast paths too in > fact, and rmap walk isn't the fastest path that would be doing such > double call, so it's not a major concern. > > Also not a bug, but one further (but more obviously safe) enhancement > I would like is to restrict those rmap invalidation ranges to > PAGE_SIZE << compound_order(page) instead of PMD_SIZE/PMD_MASK. > > + /* > + * We have to assume the worse case ie pmd for invalidation. Note that > + * the page can not be free in this function as call of try_to_unmap() > + * must hold a reference on the page. > + */ > + end = min(vma->vm_end, (start & PMD_MASK) + PMD_SIZE); > + mmu_notifier_invalidate_range_start(vma->vm_mm, start, end); > > We don't need to invalidate 2MB of secondary MMU mappings surrounding > a 4KB page, just to swapout a 4k page. split_huge_page can't run while > holding the rmap locks, so compound_order(page) is safe to use there. > > It can also be optimized incrementally later. This optimization is safe i believe. Linus i can respin with that and with further kvm dead code removal. Jérôme -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>