On 2024/3/26 21:26, Zi Yan wrote:
On 26 Mar 2024, at 2:19, Baolin Wang wrote:
On 2024/3/23 03:33, Zi Yan wrote:
From: Zi Yan <ziy@xxxxxxxxxx>
If the source folio is on deferred split list, it is likely some subpages
are not used. Split it before migration to avoid migrating unused subpages.
Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path")
did not check if a THP is on deferred split list before migration, thus,
the destination THP is never put on deferred split list even if the source
THP might be. The opportunity of reclaiming free pages in a partially
mapped THP during deferred list scanning is lost, but no other harmful
consequence is present[1].
From v4:
1. Simplify _deferred_list check without locking and do not count as
migration failures. (per Matthew Wilcox)
From v3:
1. Guarded deferred list code behind CONFIG_TRANSPARENT_HUGEPAGE to avoid
compilation error (per SeongJae Park).
From v2:
1. Split the source folio instead of migrating it (per Matthew Wilcox)[2].
From v1:
1. Used dst to get correct deferred split list after migration
(per Ryan Roberts).
[1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@xxxxxxxxxx/
[2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@xxxxxxxxxxxxxxxxxxxx/
Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Zi Yan <ziy@xxxxxxxxxx>
---
mm/migrate.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/mm/migrate.c b/mm/migrate.c
index ab9856f5931b..6bd9319624a3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1652,6 +1652,29 @@ static int migrate_pages_batch(struct list_head *from,
cond_resched();
+ /*
+ * The rare folio on the deferred split list should
+ * be split now. It should not count as a failure.
+ * Only check it without removing it from the list.
+ * Since the folio can be on deferred_split_scan()
+ * local list and removing it can cause the local list
+ * corruption. Folio split process below can handle it
+ * with the help of folio_ref_freeze().
+ *
+ * nr_pages > 2 is needed to avoid checking order-1
+ * page cache folios. They exist, in contrast to
+ * non-existent order-1 anonymous folios, and do not
+ * use _deferred_list.
+ */
+ if (nr_pages > 2 &&
+ !list_empty(&folio->_deferred_list)) {
+ if (try_split_folio(folio, from) == 0) {
IMO, we should move the split folios into the 'split_folios' list instead of the 'from' list, otherwise there might be unhandled folios remaining in the from list.
Can you elaborate on the actual situation you are thinking about? Thanks.
Sure.
Suppose there is only one large folio in the from list that needs to be
migrated, and this large folio is in the _deferred_list, which means it
needs to be split. Your patch will re-add the split base pages back into
the 'from' list. However, please see the list_for_each_entry_safe macro:
#define list_for_each_entry_safe(pos, n, head, member) \
for (pos = list_first_entry(head, typeof(*pos), member), \
n = list_next_entry(pos, member); \
!list_entry_is_head(pos, head, member); \
pos = n, n = list_next_entry(n, member))
It will terminate the iteration early because the next entry 'n' taken
out in advance is already the head, leading to the remaining split base
pages still in the from list. This can cause the following crash when I
did some migration testing:
[ 412.576943] ------------[ cut here ]------------
[ 412.576947] kernel BUG at mm/migrate.c:2634!
[ 412.577132] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 412.577201] CPU: 59 PID: 9581 Comm: numa01 Kdump: loaded Tainted: G
E 6.9.0-rc1+ #69
........
[ 412.578651] Call Trace:
[ 412.578692] <TASK>
[ 412.578730] ? die+0x33/0x90
[ 412.578770] ? do_trap+0xdf/0x110
[ 412.578815] ? migrate_misplaced_folio+0x1f2/0x2b0
[ 412.578875] ? do_error_trap+0x65/0x80
[ 412.578922] ? migrate_misplaced_folio+0x1f2/0x2b0
[ 412.578977] ? exc_invalid_op+0x4e/0x70
[ 412.579048] ? migrate_misplaced_folio+0x1f2/0x2b0
[ 412.579131] ? asm_exc_invalid_op+0x16/0x20
[ 412.579182] ? migrate_misplaced_folio+0x1f2/0x2b0
[ 412.579255] do_numa_page+0x205/0x5b0
[ 412.579305] __handle_mm_fault+0x2b0/0x6c0
[ 412.579354] handle_mm_fault+0x105/0x270
[ 412.579404] do_user_addr_fault+0x214/0x6b0
[ 412.579453] exc_page_fault+0x64/0x140
[ 412.579509] asm_exc_page_fault+0x22/0x30
2583 int migrate_misplaced_folio(struct folio *folio, struct
vm_area_struct *vma,
2584 int node)
2585 {
......
2628 if (nr_succeeded) {
2629 count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
2630 if (!node_is_toptier(folio_nid(folio)) &&
node_is_toptier(node))
2631 mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
2632 nr_succeeded);
2633 }
2634 BUG_ON(!list_empty(&migratepages));
2635 return isolated;
2636
2637 out:
After changing as below, the system crash issue is gone.
+++ b/mm/migrate.c
@@ -1668,7 +1668,7 @@ static int migrate_pages_batch(struct list_head *from,
*/
if (nr_pages > 2 &&
!list_empty(&folio->_deferred_list)) {
- if (try_split_folio(folio, from) == 0) {
+ if (try_split_folio(folio, split_folios)
== 0) {
stats->nr_thp_split += is_thp;
stats->nr_split++;
continue;