Hugh Dickins <hughd@xxxxxxxxxx> writes: > On Tue, 28 Feb 2023, Huang, Ying wrote: >> Hugh Dickins <hughd@xxxxxxxxxx> writes: >> > On Fri, 24 Feb 2023, Huang Ying wrote: >> >> >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> >> index 91198b487e49..c17ce5ee8d92 100644 >> >> --- a/mm/migrate.c >> >> +++ b/mm/migrate.c >> >> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, >> >> return rc; >> >> } >> >> >> >> +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page, >> >> + free_page_t put_new_page, unsigned long private, >> >> + enum migrate_mode mode, int reason, struct list_head *ret_folios, >> >> + struct list_head *split_folios, struct migrate_pages_stats *stats) >> >> +{ >> >> + int rc, nr_failed = 0; >> >> + LIST_HEAD(folios); >> >> + struct migrate_pages_stats astats; >> >> + >> >> + memset(&astats, 0, sizeof(astats)); >> >> + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ >> >> + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, >> >> + reason, &folios, split_folios, &astats, >> >> + NR_MAX_MIGRATE_PAGES_RETRY); >> > >> > I wonder if that and below would better be NR_MAX_MIGRATE_PAGES_RETRY / 2. >> > >> > Though I've never got down to adjusting that number (and it's not a job >> > to be done in this set of patches), those 10 retries sometimes terrify >> > me, from a latency point of view. They can have such different weights: >> > in the unmapped case, 10 retries is okay; but when a pinned page is mapped >> > into 1000 processes, the thought of all that unmapping and TLB flushing >> > and remapping is terrifying. >> > >> > Since you're retrying below, halve both numbers of retries for now? >> >> Yes. These are reasonable concerns. >> >> And in the original implementation, we only wait to lock page and wait >> the writeback to complete if pass > 2. This is kind of trying to >> migrate asynchronously for 3 times before the real synchronous >> migration. So, should we delete the "force" logic (in >> migrate_folio_unmap()), and try to migrate asynchronously for 3 times in >> batch before migrating synchronously for 7 times one by one? > > Oh, that's a good idea (but please don't imagine I've thought it through): > I hadn't realized the way in which your migrate_pages_sync() addition is > kind of duplicating the way that the "force" argument conditions behaviour, > It would be very appealing to delete the "force" argument now if you can. Sure. Will do that in the next version. > But aside from that, you've also made me wonder (again, please remember I > don't have a good picture of the new migrate_pages() sequence in my head) > whether you have already made a *great* strike against my 10 retries > terror. Am I reading it right, that the unmapping is now done on the > first try, and the remove_migration_ptes after the last try (all the > pages involved having remained locked throughout)? Yes. You are right. Now, unmapping and moving are two separate steps, and they are retried separately. After a folio has been unmapped successfully, we will not remap/unmap it 10 times if the folio is pinned so that failed to move (migrate_folio_move()). So the latency caused by retrying is much better now. But I still tend to keep the total retry number as before. Do you agree? Best Regards, Huang, Ying