Re: Slowdown with kernel 4.18.0

John David Anglin <dave.anglin@xxxxxxxx> · Fri, 24 Aug 2018 11:30:13 -0400

On 2018-08-24 9:27 AM, Rolf Eike Beer wrote:
Am 2018-08-24 14:09, schrieb Helge Deller:
> On 2018-08-21 2:31 PM, Rolf Eike Beer wrote:
With this patch I get this timing:

gcc7 - Time: 2018-08-24T01:42:07
gcc6 - Time: 2018-08-24T06:01:27

Somewhere in between 4.17.3 and 4.18.0.
Rolf, I think plain 4.18 kernel misses Dave's speed-up patches:
*
http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7797167ffde1f00446301cb22b37b7c03194cfaf 

*
http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b885ac1dc35b87a39ee176a6c7e2af9c789d8b8 

Both patches have been scheduled to be added to 4.18-stable kernel.
I guess I better revert the previous patch from Dave, no?
I would say leave the patch.  The big part of the slow down is the sync 
barrier in the TLB handler.  The above
patches don't address this issue.  They should speed up spin locks in 
general.
On PA 2.0 SMP machines, we need either a sync or an ordered store to 
release a spin lock.  Otherwise, the lock
may be released before the other accesses in the lock region are 
complete.  As a result, the operation isn't
atomic from the perspective of other CPUs.  There's no getting around 
this issue on PA 2.0 systems.
I plan to look more at using ordered loads and stores in the spin lock 
code as they clearly don't impact performance
as much as sync.

Regarding the TLB code, it turned out we were always setting the page 
accessed bit for user pages.  So, the
code to set it when a user page is accessed is redundant.  We need to 
lock to update the accessed and dirty bits
atomically.  We can keep the current TLB locking code and not set the 
page accessed bit in our user page
defines.  This should improve swap but the TLB handler is more complex.  
Another alternative is to remove
the locking and accessed update code from the TLB handler.  This 
provides the best performance for TLB
inserts but swap performance will be worse since we don't track the 
accessed bit.
Dave

--
John David Anglin  dave.anglin@xxxxxxxx