On Sat, 6 Apr 2019, James Bottomley wrote: > On Sat, 2019-04-06 at 22:15 +0200, Helge Deller wrote: > > On 06.04.19 21:49, James Bottomley wrote: > > > On Sat, 2019-04-06 at 15:36 -0400, Mikulas Patocka wrote: > > > > Parisc uses a global spinlock to protect pagetable updates in the > > > > TLB > > > > fault handlers. When multiple cores are taking TLB faults > > > > simultaneously, the cache line containing the spinlock becomes a > > > > bottleneck. > > > > > > You can't do this. As the comment in cache.c says: the lock is to > > > protect the merced bus, which runs between the CPUs on some > > > systems. > > > That means it must be a single, global lock. Of course, on systems > > > without a merced bus, we don't need the lock at all, so runtime > > > patching might be usable to fix that case. > > > > Is there a way to detect if a system has the Merced bus? > > > > See arch/parisc/include/asm/tlbflush.h too: > > /* This is for the serialisation of PxTLB broadcasts. At least on > > the > > * N class systems, only one PxTLB inter processor broadcast can be > > * active at any one time on the Merced bus. This tlb purge > > * synchronisation is fairly lightweight and harmless so we activate > > * it on all systems not just the N class. > > > > 30% speed improvement by Mikulas patches don't seem lightweight... > > Well, that's because when it was originally conceived the patch was > only about purging. It never actually involved the TLB insertion hot > path. It turns out the entanglement occurred here: > > commit 01ab60570427caa24b9debc369e452e86cd9beb4 > Author: John David Anglin <dave.anglin@xxxxxxxx> > Date: Wed Jul 1 17:18:37 2015 -0400 > > parisc: Fix some PTE/TLB race conditions and optimize > __flush_tlb_range based on timing results > > > Which is when the dbit lock got replaced by the tlb purge lock. I have > some vague memories about why we needed the dbit lock which I'll try to > make more coherent. > > James Before this patch, it used pa_dbit_lock for modifying pagetables and pa_tlb_lock for flushing. So it still suffered the performance penalty with shared pa_dbit_lock. Perhaps the proper thing would be to use global pa_tlb_lock for flushing and per-process tlb lock for pagetable updates. Mikulas