Re: [PATCH v2] sparc64: Reduce TLB flushes during hugepage unmap

Nitin Gupta <nitin.m.gupta@xxxxxxxxxx> · Mon, 21 Mar 2016 18:47:41 -0700

On 03/20/2016 09:28 PM, David Miller wrote:
> From: Nitin Gupta <nitin.m.gupta@xxxxxxxxxx>
> Date: Wed,  3 Feb 2016 15:00:23 -0800
> 
>> During hugepage unmap, TSB and TLB flushes are currently
>> issued at every PAGE_SIZE'd boundary which is unnecessary.
>> We now issue the flush at REAL_HPAGE_SIZE boundaries only.
>>
>> Without this patch workloads which unmap a large hugepage
>> backed VMA region get CPU lockups due to excessive TLB
>> flush calls.
>>
>> Signed-off-by: Nitin Gupta <nitin.m.gupta@xxxxxxxxxx>
> 
> I thought a lot about this stuff tonight, and I think we need to be
> more intelligent about this.
> 
> Doing a synchronous flush unconditionally is not good.  In particular,
> we aren't even checking if the original PTE was mapped or not which
> is going to be the most common case when a new mapping is created.
> 
> Also, we can't skip the D-cache flushes that older cpus need, as done
> by tlb_batch_add().
> 
> Therefore let's teach the TLB batcher what we're actually trying to do
> and what the optimization is, instead of trying so hard to bypass it
> altogether.
> 
> In asm/pgtable_64.h provide is_hugetlb_pte(), I'd implement it like
> this:
> 
> static inline unsigned long __pte_huge_mask(void)
> {
> 	unsigned long mask;
> 
> 	__asm__ __volatile__(
> 	"\n661:	sethi		%%uhi(%1), %0\n"
> 	"	sllx		%0, 32, %0\n"
> 	"	.section	.sun4v_2insn_patch, \"ax\"\n"
> 	"	.word		661b\n"
> 	"	mov		%2, %0\n"
> 	"	nop\n"
> 	"	.previous\n"
> 	: "=r" (mask)
> 	: "i" (_PAGE_SZHUGE_4U), "i" (_PAGE_SZHUGE_4V));
> 
> 	return mask;
> }
> 
> Then pte_mkhuge() becomes:
> 
> static inline pte_t pte_mkhuge(pte_t pte)
> {
> 	return __pte(pte_val(pte) | __pte_huge_mask());
> }
> 
> and then:
> 
> static inline bool is_hugetlb_pte(pte_t pte)
> {
> 	return (pte_val(pte) & __pte_huge_mask());
> }
> 
> And then in tlb_batch_add() can detect if the orignal PTE is huge:
> 
> 	bool huge = is_hugetlb_pte(orig);
> 
> and then the end of the function is:
> 
> 	if (huge & (vaddr & (REAL_HPAGE_SIZE - 1)))
> 		return;
> 	if (!fullmm)
> 		tlb_batch_add_one(mm, vaddr, pte_exec(orig), huge);
> 
> and tlb_batch_add_one() takes 'huge' and uses it to drive the flushing.
> 
> For a synchronous flush, we pass it down to flush_tsb_user_page().
> 
> For a batched flush we store it in tlb_batch, and any time 'huge'
> changes, we do a flush_tlb_pending(), just the same as if tb->tlb_nr
> hit TLB_BATCH_NR.
> 
> Then flush_tsb_user() uses 'tb->huge' to decide whether to flush
> MM_TSB_BASE or not.
> 

Thanks for detailed notes.  I will now combine my last two patches
which fix TLB flushing on PTE change and zero'ing out resp., along
with changes you suggested and send a v2 soon.

Thanks,
Nitin

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html