Re: [PATCH] mm: release the spinlock on zap_pte_range

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Tue, 30 Jul 2019 12:42:07 -0700

On Mon, 29 Jul 2019 17:20:52 +0900 Minchan Kim <minchan@xxxxxxxxxx> wrote:

> > > @@ -1022,7 +1023,16 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> > >  	flush_tlb_batched_pending(mm);
> > >  	arch_enter_lazy_mmu_mode();
> > >  	do {
> > > -		pte_t ptent = *pte;
> > > +		pte_t ptent;
> > > +
> > > +		if (progress >= 32) {
> > > +			progress = 0;
> > > +			if (need_resched())
> > > +				break;
> > > +		}
> > > +		progress += 8;
> > 
> > Why 8?
> 
> Just copied from copy_pte_range.

copy_pte_range() does

		if (pte_none(*src_pte)) {
			progress++;
			continue;
		}
		entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
							vma, addr, rss);
		if (entry.val)
			break;
		progress += 8;

which appears to be an attempt to balance the cost of copy_one_pte()
against the cost of not calling copy_one_pte().

Your code doesn't do this balancing and hence can be simpler.

It all seems a bit overdesigned.  need_resched() is cheap.  It's
possibly a mistake to check need_resched() on *every* loop because some
crazy scheduling load might livelock us.  But surely it would be enough
to do something like

	if (progress++ && need_resched()) {
		<reschedule>
		progress = 0;
	}

and leave it at that?