"Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> writes: > On Mon, May 04, 2015 at 10:59:16PM +0530, Aneesh Kumar K.V wrote: >> Archs like ppc64 require pte_t * to remain stable in some code path. >> They use local_irq_disable to prevent a parallel split. Generic code >> clear pmd instead of marking it _PAGE_SPLITTING in code path >> where we can afford to mark pmd none before splitting. Use a >> variant of pmdp_splitting_clear_notify that arch can override. >> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx> > > Sorry, I still try wrap my head around this problem. > > So, Power has __find_linux_pte_or_hugepte() which does lock-less lookup in > page tables with local interrupts disabled. For huge pages it casts pmd_t > to pte_t. Since format of pte_t is different from pmd_t we want to prevent > transit from pmd pointing to page table to pmd pinging to huge page (and > back) while interrupts are disabled. > > The complication for Power is that it doesn't do implicit IPI on tlb > flush. > s/doesn't do/doesn't need to do/ > Is it correct? that is correct. I will add more info to the commit message of the patch I will end up doing. > > For THP, split_huge_page() and collapse sides are covered. This patch > should address two cases of splitting PMD, but not compound page in > current upstream. > > But I think there's still *big* problem for Power -- zap_huge_pmd(). > > For instance: other CPU can shoot out a THP PMD with MADV_DONTNEED and > fault in small pages instead. IIUC, for __find_linux_pte_or_hugepte(), > it's equivalent of splitting. > > I don't see how this can be fixed without kick_all_cpus_sync() in all > pmdp_clear_flush() on Power. > Yes we could run into issue with that. Thanks for catching this. Now i am not sure whether we want to do the kick_all_cpus_sync in pmdp_get_and_clear. We do use that function while updating huge pte. The one i am looking at is change_huge_pmd. We don't need a IPI there and we would really like to avoid the IPI. Any idea why we follow the sequence of pmd_clear and set_pmd, instead of pmd_update there ? I looked at code paths we are clearing pmd where we would not require an IPI. Listing them move_huge_pmd do_huge_pmd_wp_page migrate_misplace_transhuge_page change_huge_pmd. Of this IIUC change_huge_pmd may be called more frequently and hence we may want to avoid doing kick_all_cpus_sync there ? One way to fix that would be switch change_huge_pmd to pmd_update and then we could do a kick_all_cpus_sync in pmdp_get_and_clear. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>