On Aug 15, 2022, at 12:18 PM, Peter Xu <peterx@xxxxxxxxxx> wrote: > On Fri, Aug 12, 2022 at 10:32:48AM +0800, Huang, Ying wrote: >> Peter Xu <peterx@xxxxxxxxxx> writes: >> >>> On Tue, Aug 09, 2022 at 06:00:58PM -0400, Peter Xu wrote: >>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >>>> index 27fb37d65476..699f821b8443 100644 >>>> --- a/mm/migrate_device.c >>>> +++ b/mm/migrate_device.c >>>> @@ -221,6 +221,10 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, >>>> else >>>> entry = make_readable_migration_entry( >>>> page_to_pfn(page)); >>>> + if (pte_young(pte)) >>>> + entry = make_migration_entry_young(entry); >>>> + if (pte_dirty(pte)) >>>> + entry = make_migration_entry_dirty(entry); >>>> swp_pte = swp_entry_to_pte(entry); >>>> if (pte_present(pte)) { >>>> if (pte_soft_dirty(pte)) >>> >>> This change needs to be wrapped with pte_present() at least.. >>> >>> I also just noticed that this change probably won't help anyway because: >>> >>> (1) When ram->device, the pte will finally be replaced with a device >>> private entry, and device private entry does not yet support A/D, it >>> means A/D will be dropped again, >>> >>> (2) When device->ram, we are missing information on either A/D bits, or >>> even if device private entries start to suport A/D, it's still not >>> clear whether we should take device read/write into considerations >>> too on the page A/D bits to be accurate. >>> >>> I think I'll probably keep the code there for completeness, but I think it >>> won't really help much until more things are done. >> >> It appears that there are more issues. Between "pte = *ptep" and pte >> clear, CPU may set A/D bit in PTE, so we may need to update pte when >> clearing PTE. > > Agreed, I didn't see it a huge problem with current code, but it should be > better in that way. > >> And I don't find the TLB is flushed in some cases after PTE is cleared. > > I think it's okay to not flush tlb if pte not present. But maybe you're > talking about something else? I think Huang refers to situation in which the PTE is cleared, still not flushed, and then A/D is being set by the hardware. At least on x86, the hardware is not supposed to do so. The only case I remember (and sometimes misremembers) is with KNL erratum, which perhaps needs to be considered: https://lore.kernel.org/all/20160708001911.9A3FD2B6@xxxxxxxxxxxxxxxxxx/