From: Nitin Gupta <nitin.m.gupta@xxxxxxxxxx> Date: Wed, 3 Feb 2016 15:00:23 -0800 > During hugepage unmap, TSB and TLB flushes are currently > issued at every PAGE_SIZE'd boundary which is unnecessary. > We now issue the flush at REAL_HPAGE_SIZE boundaries only. > > Without this patch workloads which unmap a large hugepage > backed VMA region get CPU lockups due to excessive TLB > flush calls. > > Signed-off-by: Nitin Gupta <nitin.m.gupta@xxxxxxxxxx> I thought a lot about this stuff tonight, and I think we need to be more intelligent about this. Doing a synchronous flush unconditionally is not good. In particular, we aren't even checking if the original PTE was mapped or not which is going to be the most common case when a new mapping is created. Also, we can't skip the D-cache flushes that older cpus need, as done by tlb_batch_add(). Therefore let's teach the TLB batcher what we're actually trying to do and what the optimization is, instead of trying so hard to bypass it altogether. In asm/pgtable_64.h provide is_hugetlb_pte(), I'd implement it like this: static inline unsigned long __pte_huge_mask(void) { unsigned long mask; __asm__ __volatile__( "\n661: sethi %%uhi(%1), %0\n" " sllx %0, 32, %0\n" " .section .sun4v_2insn_patch, \"ax\"\n" " .word 661b\n" " mov %2, %0\n" " nop\n" " .previous\n" : "=r" (mask) : "i" (_PAGE_SZHUGE_4U), "i" (_PAGE_SZHUGE_4V)); return mask; } Then pte_mkhuge() becomes: static inline pte_t pte_mkhuge(pte_t pte) { return __pte(pte_val(pte) | __pte_huge_mask()); } and then: static inline bool is_hugetlb_pte(pte_t pte) { return (pte_val(pte) & __pte_huge_mask()); } And then in tlb_batch_add() can detect if the orignal PTE is huge: bool huge = is_hugetlb_pte(orig); and then the end of the function is: if (huge & (vaddr & (REAL_HPAGE_SIZE - 1))) return; if (!fullmm) tlb_batch_add_one(mm, vaddr, pte_exec(orig), huge); and tlb_batch_add_one() takes 'huge' and uses it to drive the flushing. For a synchronous flush, we pass it down to flush_tsb_user_page(). For a batched flush we store it in tlb_batch, and any time 'huge' changes, we do a flush_tlb_pending(), just the same as if tb->tlb_nr hit TLB_BATCH_NR. Then flush_tsb_user() uses 'tb->huge' to decide whether to flush MM_TSB_BASE or not. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html