Re: [PATCH v2 4/8] mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT

Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> · Wed, 2 Aug 2023 16:28:47 +0100

On Tue, Aug 01, 2023 at 02:48:40PM +0200, David Hildenbrand wrote:
> Commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from
> gup/gup_fast") from 2012 documented as the primary reason why we would want
> to handle NUMA hinting faults from GUP:
> 
>   KVM secondary MMU page faults will trigger the NUMA hinting page
>   faults through gup_fast -> get_user_pages -> follow_page ->
>   handle_mm_fault.
> 
> That is still the case today, and relevant KVM code has been converted to
> manually set FOLL_HONOR_NUMA_FAULT. So let's stop setting
> FOLL_HONOR_NUMA_FAULT for all GUP users and cross fingers that not that
> many other ones that really require such handling for autonuma remain.
> 
> Possible interaction with MMU notifiers:
> 
>  Assume a driver obtains a page using get_user_pages() to map it into
>  a secondary MMU, and uses the MMU notifier framework to get notified on
>  changes.
> 
>  Assume get_user_pages() succeeded on a PROT_NONE-mapped page (because
>  FOLL_HONOR_NUMA_FAULT is not set) in an accessible VMA and the page is
>  mapped into a secondary MMU. Once user space would turn that mapping
>  inaccessible using mprotect(PROT_NONE), the actual PTE in the page table
>  might not change. If the MMU notifier would be smart and optimize for that
>  case "why notify if the PTE didn't change", that could be problematic.
> 
>  At least change_pmd_range() with MMU_NOTIFY_PROTECTION_VMA for now does an
>  unconditional mmu_notifier_invalidate_range_start() ->
>  mmu_notifier_invalidate_range_end() and should be fine.
> 
>  Note that even if a PTE in an accessible VMA is pte_protnone(), the
>  underlying page might be accessed by a secondary MMU that does not set
>  FOLL_HONOR_NUMA_FAULT, and test_young() MMU notifiers would return "true".
> 
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>

Also seems sane but a large portion of its correctness also depends on
patch 3 being correct.

-- 
Mel Gorman
SUSE Labs