Re: Slowdown with kernel 4.18.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eike,

On 2018-08-21 2:31 PM, Rolf Eike Beer wrote:
Timing before:
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 4.7 Aug 17, 2018 - 13:37
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 4.8 Aug 17, 2018 - 11:19
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 4.9 Aug 17, 2018 - 09:06
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 5   Aug 17, 2018 - 06:50
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 6   Aug 17, 2018 - 04:25
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 7   Aug 17, 2018 - 00:54

Timing after:
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 4.9 Aug 20, 2018 - 13:02:06
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 5   Aug 20, 2018 - 10:04:10
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 6   Aug 20, 2018 - 06:51
pioneer.sf-tec.de    Gentoo HPPA C8000 GCC 7   Aug 20, 2018 - 02:08
Why such a big difference in times between gcc versions?  I think it
would be useful to
use time to compare system times.
That is expected: I do a full build of CMake with the most recent gcc (and run
all tests), and the other versions will pick up the already build binary and
just run the test suite. The other dashboards (Qsmtp in this case, also
libarchive) run the full set for all versions. So while the times are not
really comparable between versions, the delta should stay roughly the same for
different days.
Could you try the attached change to see how it affects the various timings?  For compilations, I would expect the sync in the TLB code is the major issue.  Based on limited testing, the change does speed up things.  The patch tries using a PA 2.0 ordered store to reset the lock.

This is probably the best fix if it works.  Another alternative is to replace the "sync" with a "LDCW" on the lock address (same as lock but set target register to %r0). I have the feeling that "sync" is in some way heavy duty.  It probably works with non coherent caches.  The final option would be to drop the locking (i.e., give up on setting the accessed bit). The locking is there to ensure we correctly set the accessed and dirty bits on SMP systems.  Note we don't use the accessed bit on
kernel pages.

Dave

--
John David Anglin  dave.anglin@xxxxxxxx

diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 1b4732e20137..651ad90fccd0 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -473,7 +473,7 @@
 	LDREG		0(\ptp),\pte
 	bb,<,n		\pte,_PAGE_PRESENT_BIT,2f
 	b		\fault
-	stw		 \spc,0(\tmp)
+	stw,ma		\spc,0(\tmp)
 2:
 #endif
 	.endm
@@ -482,9 +482,7 @@
 	.macro		tlb_unlock0	spc,tmp
 #ifdef CONFIG_SMP
 	or,COND(=)	%r0,\spc,%r0
-	sync
-	or,COND(=)	%r0,\spc,%r0
-	stw             \spc,0(\tmp)
+	stw,ma		\spc,0(\tmp)
 #endif
 	.endm
 

[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux