huang ying <huang.ying.caritas@xxxxxxxxx> writes: > Hi, Alistair, > > On Fri, Aug 12, 2022 at 1:23 PM Alistair Popple <apopple@xxxxxxxxxx> wrote: >> >> migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that >> installs migration entries directly if it can lock the migrating page. >> When removing a dirty pte the dirty bit is supposed to be carried over >> to the underlying page to prevent it being lost. >> >> Currently migrate_vma_*() can only be used for private anonymous >> mappings. That means loss of the dirty bit usually doesn't result in >> data loss because these pages are typically not file-backed. However >> pages may be backed by swap storage which can result in data loss if an >> attempt is made to migrate a dirty page that doesn't yet have the >> PageDirty flag set. >> >> In this case migration will fail due to unexpected references but the >> dirty pte bit will be lost. If the page is subsequently reclaimed data >> won't be written back to swap storage as it is considered uptodate, >> resulting in data loss if the page is subsequently accessed. >> >> Prevent this by copying the dirty bit to the page when removing the pte >> to match what try_to_migrate_one() does. >> >> Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> >> Reported-by: Peter Xu <peterx@xxxxxxxxxx> >> --- >> mm/migrate_device.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >> index 27fb37d..d38f8a6 100644 >> --- a/mm/migrate_device.c >> +++ b/mm/migrate_device.c >> @@ -7,6 +7,7 @@ >> #include <linux/export.h> >> #include <linux/memremap.h> >> #include <linux/migrate.h> >> +#include <linux/mm.h> >> #include <linux/mm_inline.h> >> #include <linux/mmu_notifier.h> >> #include <linux/oom.h> >> @@ -211,6 +212,10 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, >> >> migrate->cpages++; >> >> + /* Set the dirty flag on the folio now the pte is gone. */ >> + if (pte_dirty(pte)) >> + folio_mark_dirty(page_folio(page)); >> + > > I think that this isn't sufficient to fix all issues. Firstly, "pte" > is assigned at the begin of the loop, before the PTE is cleared via > ptep_clear_flush() or ptep_get_and_clear(). That is, the pte isn't > changed atomically. Between "pte" assignment and PTE clear, the PTE > may become dirty. That is, we need to update pte when we clear the > PTE. Oh good catch, thanks. Will fix. > And I don't know why we use ptep_get_and_clear() to clear PTE if > (!anon_exclusive). Why don't we need to flush the TLB? We do the TLB flush at the end if anything was modified: /* Only flush the TLB if we actually modified any entries */ if (unmapped) flush_tlb_range(walk->vma, start, end); Obviously I don't think that will work correctly now given we have to read the dirty bits and clear the PTE atomically. I assume it was originally written this way for some sort of performance reason. - Alistair > Best Regards, > Huang, Ying > >> /* Setup special migration page table entry */ >> if (mpfn & MIGRATE_PFN_WRITE) >> entry = make_writable_migration_entry( >> >> base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136 >> -- >> git-series 0.9.1 >>