Re: [PATCH] parisc: use per-pagetable spinlock

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Sun, 07 Apr 2019 10:42:40 -0700

On Sun, 2019-04-07 at 13:23 -0400, Mikulas Patocka wrote:
> 
> On Sat, 6 Apr 2019, James Bottomley wrote:
> 
> > On Sat, 2019-04-06 at 22:15 +0200, Helge Deller wrote:
> > > On 06.04.19 21:49, James Bottomley wrote:
> > > > On Sat, 2019-04-06 at 15:36 -0400, Mikulas Patocka wrote:
> > > > > Parisc uses a global spinlock to protect pagetable updates in
> the
> > > > > TLB
> > > > > fault handlers. When multiple cores are taking TLB faults
> > > > > simultaneously, the cache line containing the spinlock
> becomes a
> > > > > bottleneck.
> > > > 
> > > > You can't do this.  As the comment in cache.c says: the lock is
> to
> > > > protect the merced bus, which runs between the CPUs on some
> > > > systems.
> > > > That means it must be a single, global lock.  Of course, on
> systems
> > > > without a merced bus, we don't need the lock at all, so runtime
> > > > patching might be usable to fix that case.
> > > 
> > > Is there a way to detect if a system has the Merced bus?
> > > 
> > > See arch/parisc/include/asm/tlbflush.h too:
> > > /* This is for the serialisation of PxTLB broadcasts.  At least
> on
> > > the
> > >  * N class systems, only one PxTLB inter processor broadcast can
> be
> > >  * active at any one time on the Merced bus.  This tlb purge
> > >  * synchronisation is fairly lightweight and harmless so we
> activate
> > >  * it on all systems not just the N class.
> > > 
> > > 30% speed improvement by Mikulas patches don't seem
> lightweight...
> > 
> > Well, that's because when it was originally conceived the patch was
> > only about purging.  It never actually involved the TLB insertion
> hot
> > path.  It turns out the entanglement occurred here:
> > 
> > commit 01ab60570427caa24b9debc369e452e86cd9beb4
> > Author: John David Anglin <dave.anglin@xxxxxxxx>
> > Date:   Wed Jul 1 17:18:37 2015 -0400
> > 
> >     parisc: Fix some PTE/TLB race conditions and optimize
> > __flush_tlb_range based on timing results
> >  
> > 
> > Which is when the dbit lock got replaced by the tlb purge lock.  I
> have
> > some vague memories about why we needed the dbit lock which I'll
> try to
> > make more coherent.
> > 
> > James
> 
> Before this patch, it used pa_dbit_lock for modifying pagetables and 
> pa_tlb_lock for flushing.
> 
> So it still suffered the performance penalty with shared
> pa_dbit_lock.
> 
> Perhaps the proper thing would be to use global pa_tlb_lock for
> flushing and per-process tlb lock for pagetable updates.

Actually, I'm not sure we need to go back to the dbit lock.  The design
goal of the updates is not to lose set bits.  It's slightly
inefficient, but not a problem, if we lose cleared bits (a page that
should be clean stays dirty and gets rewritten or a page that should be
made old stays young in the cache) and it's not actually a problem if
we lose setting the accessed bit ... an active page may just get lost
from the LRU list and have to be faulted on next access.  So if the
problem boils down to ensuring the D bit stays set, we can update it,
check and rewrite if it comes back clear.  We can ensure forward
progress because all clearing sequences don't repeat.

James