On Wed, Aug 30, 2017 at 02:53:38PM -0700, Linus Torvalds wrote: > On Wed, Aug 30, 2017 at 9:52 AM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > > > > I pointed out in earlier email ->invalidate_range can only be > > implemented (as mutually exclusive alternative to > > ->invalidate_range_start/end) by secondary MMUs that shares the very > > same pagetables with the core linux VM of the primary MMU, and those > > invalidate_range are already called by > > __mmu_notifier_invalidate_range_end. > > I have to admit that I didn't notice that fact - that we are already > in the situation that > invalidate_range is called by by the rand_end() nofifier. > > I agree that that should simplify all the code, and means that we > don't have to worry about the few cases that already implemented only > the "invalidate_page()" and "invalidate_range()" cases. > > So I think that simplifies Jérôme's patch further - once you have put > the range_start/end() cases around the inner loop, you can just drop > the invalidate_page() things entirely. > > > So this conversion from invalidate_page to invalidate_range looks > > superflous and the final mmu_notifier_invalidate_range_end should be > > enough. > > Yes. I missed the fact that we already called range() from range_end(). > > That said, the double call shouldn't hurt correctness, and it's > "closer" to old behavior for those people who only did the range/page > ones, so I wonder if we can keep Jérôme's patch in its current state > for 4.13. Yes, the double call doesn't hurt correctness. Keeping it in current state is safer if something, so I've no objection to it other than I'd like to optimize it further if possible, but it can be done later. We're already running the double call in various fast paths too in fact, and rmap walk isn't the fastest path that would be doing such double call, so it's not a major concern. Also not a bug, but one further (but more obviously safe) enhancement I would like is to restrict those rmap invalidation ranges to PAGE_SIZE << compound_order(page) instead of PMD_SIZE/PMD_MASK. + /* + * We have to assume the worse case ie pmd for invalidation. Note that + * the page can not be free in this function as call of try_to_unmap() + * must hold a reference on the page. + */ + end = min(vma->vm_end, (start & PMD_MASK) + PMD_SIZE); + mmu_notifier_invalidate_range_start(vma->vm_mm, start, end); We don't need to invalidate 2MB of secondary MMU mappings surrounding a 4KB page, just to swapout a 4k page. split_huge_page can't run while holding the rmap locks, so compound_order(page) is safe to use there. It can also be optimized incrementally later. > Because I still want to release 4.13 this weekend, despite this > upheaval. Otherwise I'll have timing problems during the next merge > window. > > Andrea, do you otherwise agree with the whole series as is? I only wish we had more time to test Jerome's patchset, but I sure agree in principle and I don't see regressions in it. The callouts to ->invalidate_page seems to have diminished over time (for the various reasons we know) so if we don't use it for the fast paths, using it only in rmap walk slow paths probably wasn't providing much performance benefit. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>