Re: [PATCH v6 02/15] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 3, 2022 at 5:40 PM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote:
>
> When scanning an anon pmd to see if it's eligible for collapse, return
> SCAN_PMD_MAPPED if the pmd already maps a THP.  Note that
> SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the
> file-collapse path, since the latter might identify pte-mapped compound
> pages.  This is required by MADV_COLLAPSE which necessarily needs to
> know what hugepage-aligned/sized regions are already pmd-mapped.
>
> Signed-off-by: Zach O'Keefe <zokeefe@xxxxxxxxxx>
> ---
>  include/trace/events/huge_memory.h |  1 +
>  mm/internal.h                      |  1 +
>  mm/khugepaged.c                    | 32 ++++++++++++++++++++++++++----
>  mm/rmap.c                          | 15 ++++++++++++--
>  4 files changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index d651f3437367..55392bf30a03 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -11,6 +11,7 @@
>         EM( SCAN_FAIL,                  "failed")                       \
>         EM( SCAN_SUCCEED,               "succeeded")                    \
>         EM( SCAN_PMD_NULL,              "pmd_null")                     \
> +       EM( SCAN_PMD_MAPPED,            "page_pmd_mapped")              \
>         EM( SCAN_EXCEED_NONE_PTE,       "exceed_none_pte")              \
>         EM( SCAN_EXCEED_SWAP_PTE,       "exceed_swap_pte")              \
>         EM( SCAN_EXCEED_SHARED_PTE,     "exceed_shared_pte")            \
> diff --git a/mm/internal.h b/mm/internal.h
> index 6e14749ad1e5..f768c7fae668 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -188,6 +188,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason
>  /*
>   * in mm/rmap.c:
>   */
> +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address);
>  extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>
>  /*
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index cc3d6fb446d5..7a914ca19e96 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -28,6 +28,7 @@ enum scan_result {
>         SCAN_FAIL,
>         SCAN_SUCCEED,
>         SCAN_PMD_NULL,
> +       SCAN_PMD_MAPPED,
>         SCAN_EXCEED_NONE_PTE,
>         SCAN_EXCEED_SWAP_PTE,
>         SCAN_EXCEED_SHARED_PTE,
> @@ -901,6 +902,31 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>         return 0;
>  }
>
> +static int find_pmd_or_thp_or_none(struct mm_struct *mm,
> +                                  unsigned long address,
> +                                  pmd_t **pmd)
> +{
> +       pmd_t pmde;
> +
> +       *pmd = mm_find_pmd_raw(mm, address);
> +       if (!*pmd)
> +               return SCAN_PMD_NULL;
> +
> +       pmde = pmd_read_atomic(*pmd);
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +       /* See comments in pmd_none_or_trans_huge_or_clear_bad() */
> +       barrier();
> +#endif
> +       if (!pmd_present(pmde))
> +               return SCAN_PMD_NULL;
> +       if (pmd_trans_huge(pmde))
> +               return SCAN_PMD_MAPPED;
> +       if (pmd_bad(pmde))
> +               return SCAN_FAIL;

khugepaged doesn't handle pmd_bad before, IIRC it may just return
SCAN_SUCCEED if everything else is good? It is fine to add it, but it
may be better to return SCAN_PMD_NULL?


> +       return SCAN_SUCCEED;
> +}
> +
>  /*
>   * Bring missing pages in from swap, to complete THP collapse.
>   * Only done if khugepaged_scan_pmd believes it is worthwhile.
> @@ -1146,11 +1172,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
>
>         VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>
> -       pmd = mm_find_pmd(mm, address);
> -       if (!pmd) {
> -               result = SCAN_PMD_NULL;
> +       result = find_pmd_or_thp_or_none(mm, address, &pmd);
> +       if (result != SCAN_SUCCEED)

There are a couple of other callsites for mm_find_pmd(), you may need
to change all of them to find_pmd_or_thp_or_none() for MADV_COLLAPSE
since khugepaged may collapse the area before MADV_COLLAPSE
reacquiring mmap_lock IIUC and MADV_COLLAPSE does care this case. It
is fine w/o MADV_COLLAPSE since khupaged doesn't care if it is PMD
mapped or not.

So it may be better to move this patch right before MADV_COLLAPSE is introduced?

>                 goto out;
> -       }
>
>         memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
>         pte = pte_offset_map_lock(mm, pmd, address, &ptl);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 04fac1af870b..c9979c6ad7a1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -767,13 +767,12 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
>         return vma_address(page, vma);
>  }
>
> -pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
> +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address)

May be better to have some notes for mm_find_pmd_raw() and mm_find_pmd().

>  {
>         pgd_t *pgd;
>         p4d_t *p4d;
>         pud_t *pud;
>         pmd_t *pmd = NULL;
> -       pmd_t pmde;
>
>         pgd = pgd_offset(mm, address);
>         if (!pgd_present(*pgd))
> @@ -788,6 +787,18 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
>                 goto out;
>
>         pmd = pmd_offset(pud, address);
> +out:
> +       return pmd;
> +}
> +
> +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
> +{
> +       pmd_t pmde;
> +       pmd_t *pmd;
> +
> +       pmd = mm_find_pmd_raw(mm, address);
> +       if (!pmd)
> +               goto out;
>         /*
>          * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at()
>          * without holding anon_vma lock for write.  So when looking for a
> --
> 2.36.1.255.ge46751e96f-goog
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux