Gerald, can you have a look? On 30.03.20 14:16, Peter Zijlstra wrote: > On Sat, Mar 28, 2020 at 12:30:50PM +0800, Zhenyu Ye wrote: >> Hi all, >> >> commit a6d60245 "Track which levels of the page tables have been cleared" >> added cleared_(ptes|pmds|puds|p4ds) in struct mmu_gather, and the values >> of them are set in some places. For example: >> >> In include/asm-generic/tlb.h, pte_free_tlb() set the tlb->cleared_pmds: >> ---8<--- >> #ifndef pte_free_tlb >> #define pte_free_tlb(tlb, ptep, address) \ >> do { \ >> __tlb_adjust_range(tlb, address, PAGE_SIZE); \ >> tlb->freed_tables = 1; \ >> tlb->cleared_pmds = 1; \ >> __pte_free_tlb(tlb, ptep, address); \ >> } while (0) >> #endif >> ---8<--- >> >> >> However, in arch/s390/include/asm/tlb.h, pte_free_tlb() set the tlb->cleared_ptes: >> ---8<--- >> static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, >> unsigned long address) >> { >> __tlb_adjust_range(tlb, address, PAGE_SIZE); >> tlb->mm->context.flush_mm = 1; >> tlb->freed_tables = 1; >> tlb->cleared_ptes = 1; >> /* >> * page_table_free_rcu takes care of the allocation bit masks >> * of the 2K table fragments in the 4K page table page, >> * then calls tlb_remove_table. >> */ >> page_table_free_rcu(tlb, (unsigned long *) pte, address); >> } >> ---8<--- >> >> >> In my view, the cleared_(ptes|pmds|puds) and (pte|pmd|pud)_free_tlb >> correspond one-to-one. So we should set cleared_ptes in pte_free_tlb(), >> then use it when needed. > > So pte_free_tlb() clears a table of PTE entries, or a PMD level entity, > also see free_pte_range(). So the generic code makes sense to me. The > PTE level invalidations will have happened on tlb_remove_tlb_entry(). > >> I'm very confused about this. Which is wrong? Or is there something >> I understand wrong? > > I agree the s390 case is puzzling, Martin does s390 need a PTE level > invalidate for removing a PTE table or was this a mistake? >