On Thu, Nov 5, 2020 at 7:00 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > I don't know why this was addressed to me in particular (easy to imagine > I've made a mod at some time that bears on this, but I haven't found it); > but have spent longer considering the patch than I should have done - > apologies to everyone else I should be replying to. > I really appreciate your insights and historical anecdotes. I always learn something new. > On Wed, 4 Nov 2020, Shakeel Butt wrote: > > > Since the commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier > > semantic v2"), the code to check the secondary MMU's page table access > > bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from > > the secondary MMU's page table before the check. More specifically for > > those secondary MMUs which unmap the memory in > > mmu_notifier_invalidate_range_start() like kvm. > > Well, "broken" seems a bit unfair to 369ea8242c0f. It put a warning > mmu_notifier_invalidate_range_start() at the beginning, and matching > mmu_notifier_invalidate_range_end() at the end of try_to_unmap_one(); > with its mmu_notifier_invalidate_range() exactly where the > mmu_notifier_invalidate_page() was before (I think the story gets > more complicated later). Yes, if notifiee takes invalidate_range_start() > as signal to invalidate all their own range, then that will sometimes > cause them unnecessary invalidations. > > Not just for !TTU_IGNORE_ACCESS: there's also the !TTU_IGNORE_MLOCK > case meeting a VM_LOCKED vma and setting PageMlocked where that had > been missed earlier (and page_check_references() has intentionally but > confusingly marked this case as PAGEREF_RECLAIM, not to reclaim the page, > but to reach the try_to_unmap_one() which will recognize and fix it up - > historically easier to do there than in page_referenced_one()). > > But I think mmu_notifier is a diversion from what needs thinking about. > > > > > However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the > > absence of TTU_IGNORE_ACCESS and it explicitly performs the page table > > access check before trying to unmap the page. So, at worst the reclaim > > will miss accesses in a very short window if we remove page table access > > check in unmapping code. > > I agree with you and Johannes that the short race window when the page > might be re-referenced is no issue at all: the functional issue is the > one in your next paragraph. If that's agreed by memcg guys, great, > then this patch is a nice observation and a welcome cleanup. > > > > > There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg > > reclaim. From memcg reclaim the page_referenced() only account the > > accesses from the processes which are in the same memcg of the target > > page but the unmapping code is considering accesses from all the > > processes, so, decreasing the effectiveness of memcg reclaim. > > Are you sure it was unintended? > > Since the dawn of memcg reclaim, it has been the case that a recent > reference in a "foreign" vma has rescued that page from being reclaimed: > now you propose to change that. I expect some workflows will benefit > and others be disadvantaged. I have no objection myself to the change, > but I do think it needs to be better highlighted here, and explicitly > agreed by those more familiar with memcg reclaim. The reason I said unintended was due to bed7161a519a2 ("Memory controller: make page_referenced() cgroup aware"). From the commit message it seems like the intention was to not be influenced by foreign accesses during memcg reclaim but it missed to make try_to_unmap_one() memcg aware. I agree with you that this is a behavior change and we have explicitly agree to not let memcg reclaim be influenced by foreign accesses.