Bharata B Rao <bharata@xxxxxxx> writes: > On 9/21/2022 11:36 AM, Huang Ying wrote: >> From: "Huang, Ying" <ying.huang@xxxxxxxxx> >> >> Now, migrate_pages() migrate pages one by one, like the fake code as >> follows, >> >> for each page >> unmap >> flush TLB >> copy >> restore map >> >> If multiple pages are passed to migrate_pages(), there are >> opportunities to batch the TLB flushing and copying. That is, we can >> change the code to something as follows, >> >> for each page >> unmap >> for each page >> flush TLB >> for each page >> copy >> for each page >> restore map >> >> The total number of TLB flushing IPI can be reduced considerably. And >> we may use some hardware accelerator such as DSA to accelerate the >> page copying. >> >> So in this patch, we refactor the migrate_pages() implementation and >> implement the TLB flushing batching. Base on this, hardware >> accelerated page copying can be implemented. >> >> If too many pages are passed to migrate_pages(), in the naive batched >> implementation, we may unmap too many pages at the same time. The >> possibility for a task to wait for the migrated pages to be mapped >> again increases. So the latency may be hurt. To deal with this >> issue, the max number of pages be unmapped in batch is restricted to >> no more than HPAGE_PMD_NR. That is, the influence is at the same >> level of THP migration. > > Thanks for the patchset. I find it hitting the following BUG() when > running mmtests/autonumabench: > > kernel BUG at mm/migrate.c:2432! > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > CPU: 7 PID: 7150 Comm: numa01 Not tainted 6.0.0-rc5+ #171 > Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.5.6 10/06/2021 > RIP: 0010:migrate_misplaced_page+0x670/0x830 > Code: 36 48 8b 3c c5 e0 7a 19 8d e8 dc 10 f7 ff 4c 89 e7 e8 f4 43 f5 ff 8b 55 bc 85 d2 75 6f 48 8b 45 c0 4c 39 e8 0f 84 b0 fb ff ff <0f> 0b 48 8b 7d 90 e9 ec fc ff ff 48 83 e8 01 e9 48 fa ff ff 48 83 > RSP: 0000:ffffb1b29ec3fd38 EFLAGS: 00010202 > RAX: ffffe946460f8248 RBX: 0000000000000001 RCX: ffffe946460f8248 > RDX: 0000000000000000 RSI: ffffe946460f8248 RDI: ffffb1b29ec3fce0 > RBP: ffffb1b29ec3fda8 R08: 0000000000000000 R09: 0000000000000005 > R10: 0000000000000001 R11: 0000000000000000 R12: ffffe946460f8240 > R13: ffffb1b29ec3fd68 R14: 0000000000000001 R15: ffff9698beed5000 > FS: 00007fcc31fee640(0000) GS:ffff9697b0000000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fcc3a3a5000 CR3: 000000016e89c002 CR4: 0000000000770ee0 > PKRU: 55555554 > Call Trace: > <TASK> > __handle_mm_fault+0xb87/0xff0 > handle_mm_fault+0x126/0x3c0 > do_user_addr_fault+0x1ed/0x690 > exc_page_fault+0x84/0x2c0 > asm_exc_page_fault+0x27/0x30 > RIP: 0033:0x7fccfa1a1180 > Code: 81 fa 80 00 00 00 76 d2 c5 fe 7f 40 40 c5 fe 7f 40 60 48 83 c7 80 48 81 fa 00 01 00 00 76 2b 48 8d 90 80 00 00 00 48 83 e2 c0 <c5> fd 7f 02 c5 fd 7f 42 20 c5 fd 7f 42 40 c5 fd 7f 42 60 48 83 ea > RSP: 002b:00007fcc31fede38 EFLAGS: 00010283 > RAX: 00007fcc39fff010 RBX: 000000000000002c RCX: 00007fccfa11ea3d > RDX: 00007fcc3a3a5000 RSI: 0000000000000000 RDI: 00007fccf9ffef90 > RBP: 00007fcc39fff010 R08: 00007fcc31fee640 R09: 00007fcc31fee640 > R10: 00007ffdecef614f R11: 0000000000000246 R12: 00000000c0000000 > R13: 0000000000000000 R14: 00007fccfa094850 R15: 00007ffdecef6190 > > This is BUG_ON(!list_empty(&migratepages)) in migrate_misplaced_page(). Thank you very much for reporting! I haven't reproduced this yet. But I will pay special attention to this when develop the next version, even if I cannot reproduce this finally. Best Regards, Huang, Ying