On Tue, Aug 01, 2023 at 02:48:37PM +0200, David Hildenbrand wrote: > Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by > gup_can_follow_protnone()") missed that follow_page() and > follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really > don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting > or due to inaccessible (PROT_NONE) VMAs. > > As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page > faults from gup/gup_fast"): "Other follow_page callers like KSM should not > use FOLL_NUMA, or they would fail to get the pages if they use follow_page > instead of get_user_pages." > > liubo reported [1] that smaps_rollup results are imprecise, because they > miss accounting of pages that are mapped PROT_NONE. Further, it's easy > to reproduce that KSM no longer works on inaccessible VMAs on x86-64, > because pte_protnone()/pmd_protnone() also indictaes "true" in > inaccessible VMAs, and follow_page() refuses to return such pages right > now. > > As KVM really depends on these NUMA hinting faults, removing the > pte_protnone()/pmd_protnone() handling in GUP code completely is not really > an option. > > To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT > to restore the original behavior for now and add better comments. > > Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in > is_valid_gup_args(), to add that flag for all external GUP users. > > Note that there are three GUP-internal __get_user_pages() users that don't > end up calling is_valid_gup_args() and consequently won't get > FOLL_HONOR_NUMA_FAULT set. > > 1) get_dump_page(): we really don't want to handle NUMA hinting > faults. It specifies FOLL_FORCE and wouldn't have honored NUMA > hinting faults already. > 2) populate_vma_page_range(): we really don't want to handle NUMA hinting > faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have > honored NUMA hinting faults already. > 3) faultin_vma_page_range(): we similarly don't want to handle NUMA > hinting faults. > > To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in > inaccessible VMAs properly, we have to perform VMA accessibility checks in > gup_can_follow_protnone(). > > As GUP-fast should reject such pages either way in > pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and > arm64 that both implement pte_protnone() -- let's just always fallback > to ordinary GUP when stumbling over pte_protnone()/pmd_protnone(). > > As Linus notes [2], honoring NUMA faults might only make sense for > selected GUP users. > > So we should really see if we can instead let relevant GUP callers specify > it manually, and not trigger NUMA hinting faults from GUP as default. > Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag > and adding appropriate documenation. > > [1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@xxxxxxxxxx > [2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=g@xxxxxxxxxxxxxx > > Reported-by: liubo <liubo254@xxxxxxxxxx> > Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@xxxxxxxxxx > Reported-by: Peter Xu <peterx@xxxxxxxxxx> > Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/ > Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> I agree that FOLL_REMOTE probably needs separate treatment but also agree that it's outside the context of this patch, particularly as a -stable candidate so Acked-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> I've a minor nit below that would be nice to get fixed up, but not mandatory. > --- > include/linux/mm.h | 21 +++++++++++++++------ > include/linux/mm_types.h | 9 +++++++++ > mm/gup.c | 29 +++++++++++++++++++++++------ > mm/huge_memory.c | 2 +- > 4 files changed, 48 insertions(+), 13 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index 2493ffa10f4b..f463d3004ddc 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2240,6 +2244,12 @@ static bool is_valid_gup_args(struct page **pages, int *locked, > gup_flags |= FOLL_UNLOCKABLE; > } > > + /* > + * For now, always trigger NUMA hinting faults. Some GUP users like > + * KVM really require it to benefit from autonuma. > + */ > + gup_flags |= FOLL_HONOR_NUMA_FAULT; > + > /* FOLL_GET and FOLL_PIN are mutually exclusive. */ > if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) == > (FOLL_PIN | FOLL_GET))) Expand on *why* KVM requires it even though I suspect this changes later in the series. Maybe "Some GUP users like KVM require the hint to be as the calling context of GUP is functionally similar to a memory reference from task context"? Also minor nit -- s/autonuma/NUMA Balancing/ or numab. autonuma refers to a specific implementation of automatic balancing that operated similar to khugepaged but not merged. If you grep for it, you'll find that autonuma is only referenced in powerpc-specific code. It's not important these days but very early on, it was very confusing if AutoNUMA was mentioned when NUMAB was intended. -- Mel Gorman SUSE Labs