On Tue, Jan 25, 2022 at 10:42 AM Zi Yan <ziy@xxxxxxxxxx> wrote: > > On 24 Jan 2022, at 20:55, Muchun Song wrote: > > > On Tue, Jan 25, 2022 at 3:22 AM Zi Yan <ziy@xxxxxxxxxx> wrote: > >> > >> On 24 Jan 2022, at 13:11, David Rientjes wrote: > >> > >>> On Mon, 24 Jan 2022, Muchun Song wrote: > >>> > >>>> The D-cache maintenance inside move_to_new_page() only consider one page, > >>>> there is still D-cache maintenance issue for tail pages of THP. Fix this > >>>> by not using flush_dcache_folio() since it is not backportable. > >>>> > >>> > >>> The mention of being backportable suggests that we should backport this, > >>> likely to 4.14+. So should it be marked as stable? > >> > >> Hmm, after more digging, I am not sure if the bug exists. For THP migration, > >> flush_cache_range() is used in remove_migration_pmd(). The flush_dcache_page() > >> was added by Lars Persson (cc’d) to solve the data corruption on MIPS[1], > >> but THP migration is only enabled on x86_64, PPC_BOOK3S_64, and ARM64. > > > > I only mention the THP case. After some more thinking, I think the HugeTLB > > should also be considered, Right? The HugeTLB is enabled on arm, arm64, > > mips, parisc, powerpc, riscv, s390 and sh. > > > > +Mike for HugeTLB > > If HugeTLB page migration also misses flush_dcache_page() on its tail pages, > you will need a different patch for the commit introducing hugetlb page migration. Agree. I think arm (see the following commit) has handled this issue, while most others do not. commit 0b19f93351dd ("ARM: mm: Add support for flushing HugeTLB pages.") But I do not have any real devices to test if this issue exists on other archs. In theory, it exists. > > >> > >> To make code more consistent, I guess flush_cache_range() in remove_migration_pmd() > >> can be removed, since it is superseded by the flush_dcache_page() below. > > > > From my point of view, flush_cache_range() in remove_migration_pmd() is > > a wrong usage, which cannot replace flush_dcache_page(). I think the commit > > c2cc499c5bcf ("mm compaction: fix of improper cache flush in migration code") > > , which is similar to the situation here, can offer more infos. > > > > Thanks for the information. That helps. But remove_migration_pmd() did not cause > any issue at the commit pointed by Fixes but at the commit which enabled THP > migration on IBM and ARM64, whichever came first. > > IIUC, there will be different versions of the fix targeting different stable > trees: > > 1. pre-4.14, THP migration did not exist: you will need to fix the use of > flush_dcache_page() at that time for HugeTLB page migration. Both flushing > dcache page for all subpages and moving flush_dcache_page from > remove_migration_pte() to move_to_new_page(). 4.9 and 4.4 are affected. > But EOL of 4.4 is next month, so you might skip it. > > 2. 4.14 to before device public page is removed: your current fix will not > apply directly, but the for loop works. flush_cache_range() in > remove_migration_pmd() should be removed, since it is dead code based on > the commit you mentioned. It might not be worth the effort to find when > IBM and ARM64 enable THP migration. > > 3. after device public page is removed: your current fix will apply cleanly > and the removal of flush_cache_range() in remove_migration_pmd() should > be added. > > Let me know if it makes sense. Make sense. Thanks.