Re: [PATCH v5] mm/migrate: split source folio if it is on deferred split list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024/3/26 21:26, Zi Yan wrote:
On 26 Mar 2024, at 2:19, Baolin Wang wrote:

On 2024/3/23 03:33, Zi Yan wrote:
From: Zi Yan <ziy@xxxxxxxxxx>

If the source folio is on deferred split list, it is likely some subpages
are not used. Split it before migration to avoid migrating unused subpages.

Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path")
did not check if a THP is on deferred split list before migration, thus,
the destination THP is never put on deferred split list even if the source
THP might be. The opportunity of reclaiming free pages in a partially
mapped THP during deferred list scanning is lost, but no other harmful
consequence is present[1].

  From v4:
1. Simplify _deferred_list check without locking and do not count as
     migration failures. (per Matthew Wilcox)

  From v3:
1. Guarded deferred list code behind CONFIG_TRANSPARENT_HUGEPAGE to avoid
     compilation error (per SeongJae Park).

  From v2:
1. Split the source folio instead of migrating it (per Matthew Wilcox)[2].

  From v1:
1. Used dst to get correct deferred split list after migration
     (per Ryan Roberts).

[1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@xxxxxxxxxx/
[2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@xxxxxxxxxxxxxxxxxxxx/

Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Zi Yan <ziy@xxxxxxxxxx>
---
   mm/migrate.c | 23 +++++++++++++++++++++++
   1 file changed, 23 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index ab9856f5931b..6bd9319624a3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1652,6 +1652,29 @@ static int migrate_pages_batch(struct list_head *from,
    			cond_resched();
  +			/*
+			 * The rare folio on the deferred split list should
+			 * be split now. It should not count as a failure.
+			 * Only check it without removing it from the list.
+			 * Since the folio can be on deferred_split_scan()
+			 * local list and removing it can cause the local list
+			 * corruption. Folio split process below can handle it
+			 * with the help of folio_ref_freeze().
+			 *
+			 * nr_pages > 2 is needed to avoid checking order-1
+			 * page cache folios. They exist, in contrast to
+			 * non-existent order-1 anonymous folios, and do not
+			 * use _deferred_list.
+			 */
+			if (nr_pages > 2 &&
+			   !list_empty(&folio->_deferred_list)) {
+				if (try_split_folio(folio, from) == 0) {

IMO, we should move the split folios into the 'split_folios' list instead of the 'from' list, otherwise there might be unhandled folios remaining in the from list.

Can you elaborate on the actual situation you are thinking about? Thanks.

Sure.

Suppose there is only one large folio in the from list that needs to be migrated, and this large folio is in the _deferred_list, which means it needs to be split. Your patch will re-add the split base pages back into the 'from' list. However, please see the list_for_each_entry_safe macro:

#define list_for_each_entry_safe(pos, n, head, member)			\
	for (pos = list_first_entry(head, typeof(*pos), member),	\
		n = list_next_entry(pos, member);			\
	     !list_entry_is_head(pos, head, member); 			\
	     pos = n, n = list_next_entry(n, member))

It will terminate the iteration early because the next entry 'n' taken out in advance is already the head, leading to the remaining split base pages still in the from list. This can cause the following crash when I did some migration testing:

[  412.576943] ------------[ cut here ]------------
[  412.576947] kernel BUG at mm/migrate.c:2634!
[  412.577132] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 412.577201] CPU: 59 PID: 9581 Comm: numa01 Kdump: loaded Tainted: G E 6.9.0-rc1+ #69
........
[  412.578651] Call Trace:
[  412.578692]  <TASK>
[  412.578730]  ? die+0x33/0x90
[  412.578770]  ? do_trap+0xdf/0x110
[  412.578815]  ? migrate_misplaced_folio+0x1f2/0x2b0
[  412.578875]  ? do_error_trap+0x65/0x80
[  412.578922]  ? migrate_misplaced_folio+0x1f2/0x2b0
[  412.578977]  ? exc_invalid_op+0x4e/0x70
[  412.579048]  ? migrate_misplaced_folio+0x1f2/0x2b0
[  412.579131]  ? asm_exc_invalid_op+0x16/0x20
[  412.579182]  ? migrate_misplaced_folio+0x1f2/0x2b0
[  412.579255]  do_numa_page+0x205/0x5b0
[  412.579305]  __handle_mm_fault+0x2b0/0x6c0
[  412.579354]  handle_mm_fault+0x105/0x270
[  412.579404]  do_user_addr_fault+0x214/0x6b0
[  412.579453]  exc_page_fault+0x64/0x140
[  412.579509]  asm_exc_page_fault+0x22/0x30

2583 int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
2584                             int node)
2585 {
		......

2628         if (nr_succeeded) {
2629                 count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
2630 if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
2631                         mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
2632                                             nr_succeeded);
2633         }
2634         BUG_ON(!list_empty(&migratepages));
2635         return isolated;
2636
2637 out:

After changing as below, the system crash issue is gone.

+++ b/mm/migrate.c
@@ -1668,7 +1668,7 @@ static int migrate_pages_batch(struct list_head *from,
                         */
                        if (nr_pages > 2 &&
                           !list_empty(&folio->_deferred_list)) {
-                               if (try_split_folio(folio, from) == 0) {
+ if (try_split_folio(folio, split_folios) == 0) {
                                        stats->nr_thp_split += is_thp;
                                        stats->nr_split++;
                                        continue;




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux