Dev Jain <dev.jain@xxxxxxx> writes: > On 8/13/24 12:52, Dev Jain wrote: >> >> On 8/13/24 10:30, Dev Jain wrote: >>> >>> On 8/12/24 17:38, Dev Jain wrote: >>>> >>>> On 8/12/24 13:01, Huang, Ying wrote: >>>>> Dev Jain <dev.jain@xxxxxxx> writes: >>>>> >>>>>> On 8/12/24 11:45, Huang, Ying wrote: >>>>>>> Dev Jain <dev.jain@xxxxxxx> writes: >>>>>>> >>>>>>>> On 8/12/24 11:04, Huang, Ying wrote: >>>>>>>>> Hi, Dev, >>>>>>>>> >>>>>>>>> Dev Jain <dev.jain@xxxxxxx> writes: >>>>>>>>> >>>>>>>>>> As already being done in __migrate_folio(), wherein we >>>>>>>>>> backoff if the >>>>>>>>>> folio refcount is wrong, make this check during the >>>>>>>>>> unmapping phase, upon >>>>>>>>>> the failure of which, the original state of the PTEs will be >>>>>>>>>> restored and >>>>>>>>>> the folio lock will be dropped via migrate_folio_undo_src(), >>>>>>>>>> any racing >>>>>>>>>> thread will make progress and migration will be retried. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Dev Jain <dev.jain@xxxxxxx> >>>>>>>>>> --- >>>>>>>>>> mm/migrate.c | 9 +++++++++ >>>>>>>>>> 1 file changed, 9 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>>>>>>>> index e7296c0fb5d5..477acf996951 100644 >>>>>>>>>> --- a/mm/migrate.c >>>>>>>>>> +++ b/mm/migrate.c >>>>>>>>>> @@ -1250,6 +1250,15 @@ static int >>>>>>>>>> migrate_folio_unmap(new_folio_t get_new_folio, >>>>>>>>>> } >>>>>>>>>> if (!folio_mapped(src)) { >>>>>>>>>> + /* >>>>>>>>>> + * Someone may have changed the refcount and maybe >>>>>>>>>> sleeping >>>>>>>>>> + * on the folio lock. In case of refcount mismatch, >>>>>>>>>> bail out, >>>>>>>>>> + * let the system make progress and retry. >>>>>>>>>> + */ >>>>>>>>>> + struct address_space *mapping = folio_mapping(src); >>>>>>>>>> + >>>>>>>>>> + if (folio_ref_count(src) != >>>>>>>>>> folio_expected_refs(mapping, src)) >>>>>>>>>> + goto out; >>>>>>>>>> __migrate_folio_record(dst, old_page_state, >>>>>>>>>> anon_vma); >>>>>>>>>> return MIGRATEPAGE_UNMAP; >>>>>>>>>> } >>>>>>>>> Do you have some test results for this? For example, after >>>>>>>>> applying the >>>>>>>>> patch, the migration success rate increased XX%, etc. >>>>>>>> I'll get back to you on this. >>>>>>>> >>>>>>>>> My understanding for this issue is that the migration success >>>>>>>>> rate can >>>>>>>>> increase if we undo all changes before retrying. This is the >>>>>>>>> current >>>>>>>>> behavior for sync migration, but not for async migration. If >>>>>>>>> so, we can >>>>>>>>> use migrate_pages_sync() for async migration too to increase >>>>>>>>> success >>>>>>>>> rate? Of course, we need to change the function name and >>>>>>>>> comments. >>>>>>>> As per my understanding, this is not the current behaviour for sync >>>>>>>> migration. After successful unmapping, we fail in >>>>>>>> migrate_folio_move() >>>>>>>> with -EAGAIN, we do not call undo src+dst (rendering the loop >>>>>>>> around >>>>>>>> migrate_folio_move() futile), we do not push the failed folio >>>>>>>> onto the >>>>>>>> ret_folios list, therefore, in _sync(), _batch() is never >>>>>>>> tried again. >>>>>>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be >>>>>>> called first, if failed, the folio will be restored to the original >>>>>>> state (unlocked). Then migrate_pages_batch(,_SYNC*) is called >>>>>>> again. >>>>>>> So, we unlock once. If it's necessary, we can unlock more times via >>>>>>> another level of loop. >>>>>> Yes, that's my point. We need to undo src+dst and retry. >>>>> For sync migration, we undo src+dst and retry now, but only once. You >>>>> have shown that more retrying increases success rate. >>>>> >>>>>> We will have >>>>>> to decide where we want this retrying to be; do we want to change the >>>>>> return value, end up in the while loop wrapped around _sync(), >>>>>> and retry >>>>>> there by adding another level of loop, or do we want to make use >>>>>> of the >>>>>> existing retry loops, one of which is wrapped around _unmap(); >>>>>> the latter >>>>>> is my approach. The utility I see for the former approach is >>>>>> that, in case >>>>>> of a large number of page migrations (which should usually be >>>>>> the case), >>>>>> we are giving more time for the folio to get retried. The latter >>>>>> does not >>>>>> give much time and discards the folio if it did not succeed >>>>>> under 7 times. >>>>> Because it's a race, I guess that most folios will be migrated >>>>> successfully in the first pass. >>>>> >>>>> My concerns of your method are that it deal with just one case >>>>> specially. While retrying after undoing all appears more general. >>>> >>>> >>>> Makes sense. Also, please ignore my "change the return value" >>>> thing, I got confused between unmap_folios, ret_folios, etc. >>>> Now I think I understood what the lists are doing :) >>>> >>>>> >>>>> If it's really important to retry after undoing all, we can either >>>>> convert two retying loops of migrate_pages_batch() into one loop, or >>>>> remove retry loop in migrate_pages_batch() and retry in its caller >>>>> instead. >>>> >>>> And if I implemented this correctly, the following makes the test >>>> pass always: >>>> https://www.codedump.xyz/diff/Zrn7EdxzNXmXyNXe >>> >>> >>> Okay, I did mess up with the implementation, leading to a false >>> positive. Let me try again :) >> >> >> Hopefully this should do the job: >> https://www.codedump.xyz/diff/ZrsIV8JSOPYx5V_u >> >> But the result is worse than the patch proposed; I rarely hit >> a 3 digit number of successes of move_pages(). But, on a >> base kernel without any changes, when I apply David's >> suggestion to change the test, if I choose 7 as the number >> of retries (= NR_MAX_MIGRATE_SYNC_RETRY) in the test, I >> can touch even 4 digits. I am puzzled. >> We can also try merging the for loops of unmap and move... > > > If people are okay with this change, I guess I can send it as > a v2? I concur with your assessment that my initial approach > is solving a specific case; the above approach does give me > a slight improvement on arm64 and should be an improvement > in general, since it makes sense to defer retrying the failed folio > as much as we can. We need to deal with something else before a formal v2, - stats need to be fixed, please check result processing for the first loop of migrate_pages_sync(). - Do we need something similar for async migration. - Can we add another level of explicit loop for the second loop of migrate_pages_sync()? That is to improve code readability. Or, add a function to dot that? - Is it good to remove retry loop in migrate_pages_batch()? And do retry in the caller? -- Best Regards, Huang, Ying