Re: [PATCH 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26.02.24 09:37, Lance Yang wrote:
Hey Barry,

Thanks for taking time to review!

On Mon, Feb 26, 2024 at 12:00 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
[...]
On Mon, Feb 26, 2024 at 1:33 AM Lance Yang <ioworker0@xxxxxxxxx> wrote:
[...]
We did something similar on MADV_PAGEOUT[1]
[1] https://lore.kernel.org/linux-mm/20240118111036.72641-7-21cnbao@xxxxxxxxx/

Thanks for providing the link above.

[...]
+                        * Avoid unnecessary folio splitting if the large
+                        * folio is entirely within the given range.
+                        */
+                       folio_test_clear_dirty(folio);
+                       folio_unlock(folio);
+                       for (; addr != next_addr; pte++, addr += PAGE_SIZE) {
+                               ptent = ptep_get(pte);
+                               if (pte_young(ptent) || pte_dirty(ptent)) {
+                                       ptent = ptep_get_and_clear_full(
+                                               mm, addr, pte, tlb->fullmm);
+                                       ptent = pte_mkold(ptent);
+                                       ptent = pte_mkclean(ptent);
+                                       set_pte_at(mm, addr, pte, ptent);
+                                       tlb_remove_tlb_entry(tlb, pte, addr);
+                               }

The code works under the assumption the large folio is entirely mapped
in all PTEs in the range. This is not always true.

This won't work in some cases as some PTEs might be mapping to the
large folios. some others might have been unmapped or mapped
to different folios.

so in MADV_PAGEOUT, we have a function to check the folio is
really entirely mapped:

+static inline bool pte_range_cont_mapped(unsigned long start_pfn,
+ pte_t *start_pte, unsigned long start_addr, int nr)
+{
+              int i;
+              pte_t pte_val;
+
+              for (i = 0; i < nr; i++) {
+                           pte_val = ptep_get(start_pte + i);
+
+                           if (pte_none(pte_val))
+                                        return false;
+
+                           if (pte_pfn(pte_val) != (start_pfn + i))
+                                        return false;
+              }
+
+              return true;
+}

Thanks for providing the information; it's very helpful to me!
I made some changes. Would you mind taking another look, please?

As a diff against this patch.

diff --git a/mm/madvise.c b/mm/madvise.c
index bcbf56595a2e..255d2f329be4 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -616,6 +616,18 @@ static long madvise_pageout(struct vm_area_struct *vma,
  	return 0;
  }
+static inline bool pte_range_cont_mapped(pte_t *pte, unsigned long nr)
+{
+	pte_t pte_val;
+	unsigned long pfn = pte_pfn(pte);
+	for (int i = 0; i < nr; i++) {
+		pte_val = ptep_get(pte + i);
+		if (pte_none(pte_val) || pte_pfn(pte_val) != (pfn + i))
+			return false;
+	}
+	return true;
+}

I dislike the "cont mapped" terminology.

Maybe folio_pte_batch() does what you want?

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux