Re: [PATCH RFC 3/4] arch/sparc: Optimized memcpy, memset, copy_to_user, copy_from_user for M7

Babu Moger <babu.moger@xxxxxxxxxx> · Thu, 3 Aug 2017 16:08:26 -0500




David, Thanks for the comments. I am working on addressing your feedback.

Comments inline below.


On 7/29/2017 4:36 PM, David Miller wrote:
From: Babu Moger <babu.moger@xxxxxxxxxx>
Date: Thu, 27 Jul 2017 15:57:29 -0600

@@ -600,7 +600,7 @@ niagara_tlb_fixup:
  	be,pt	%xcc, niagara4_patch
  	 nop
  	cmp	%g1, SUN4V_CHIP_SPARC_M7
-	be,pt	%xcc, niagara4_patch
+	be,pt	%xcc, sparc_m7_patch
  	 nop
  	cmp	%g1, SUN4V_CHIP_SPARC_SN
  	be,pt	%xcc, niagara4_patch
This part will need to be respun now that the M8 patches are in
as there will be a slight conflict in this hunk.
Actually, these patches have been tested both on M7 and M8.  I wanted to 
add M8 also. But M8 patches were
not in the kernel yet. Now that these M8 patches(from Allen) are in the 
kernel, I can add it now.
Will update it in the second version.
+        .register      %g2,#scratch
+
+       .section        ".text"
+       .global FUNC_NAME
+       .type   FUNC_NAME, #function
+       .align  16
+FUNC_NAME:
+	srlx            %o2, 31, %g2
+	cmp             %g2, 0
+	tne             %xcc, 5
+	PREAMBLE
+	mov		%o0, %g1	! save %o0
+	brz,pn          %o2, .Lsmallx
+
+	cmp            %o2, 3
+        ble,pn          %icc, .Ltiny_cp
+         cmp            %o2, 19
+        ble,pn          %icc, .Lsmall_cp
+         or             %o0, %o1, %g2
+        cmp             %o2, SMALL_MAX
+        bl,pn           %icc, .Lmedium_cp
+         nop
What in world is going on with this indentation?

I can't comprehend how, if anyone actually put their eyes on
this code and the patch itself, wouldn't notice this.

DO NOT mix all-spaced and TAB+space indentation.

Always, consistently, use as many TABs as you can and
then when needed add trailing spaces.

Sure. Will address these problems. In general will address all the 
format issues. thanks

+.Lsrc_dst_aligned_on_8:
+	! check if we are copying MED_MAX or more bytes
+        set MED_MAX, %o3
+        cmp %o2, %o3 			! limit to store buffer size
+	bgu,pn	%ncc, .Llarge_align8_copy
+	 nop
Again, same problem here.

+/*
+ * Handle all cases where src and dest are aligned on word
+ * boundaries. Use unrolled loops for better performance.
+ * This option wins over standard large data move when
+ * source and destination is in cache for.Lmedium
+ * to short data moves.
+ */
+        set MED_WMAX, %o3
+        cmp %o2, %o3 			! limit to store buffer size
+	bge,pt	%ncc, .Lunalignrejoin	! otherwise rejoin main loop
+	 nop
More weird indentation.

+.dbalign:
+        andcc   %o5, 7, %o3             ! is sp1 aligned on a 8 byte bound?
+        bz,pt   %ncc, .blkalign         ! already long word aligned
+         sub     %o3, 8, %o3             ! -(bytes till long word aligned)
+
+        add     %o2, %o3, %o2           ! update o2 with new count
+        ! Set -(%o3) bytes till sp1 long word aligned
+1:      stb     %o1, [%o5]              ! there is at least 1 byte to set
+	inccc   %o3                     ! byte clearing loop
+        bl,pt   %ncc, 1b
+	 inc     %o5
More weird indentation.

+        ! Now sp1 is block aligned
+.blkwr:
+        andn    %o2, 63, %o4            ! calculate size of blocks in bytes
+        brz,pn  %o1, .wrzero            ! special case if c == 0
+         and     %o2, 63, %o3            ! %o3 = bytes left after blk stores.
+
+        set     MIN_LOOP, %g1
+        cmp     %o4, %g1                ! check there are enough bytes to set
+	blu,pn  %ncc, .short_set        ! to justify cost of membar
+                                        ! must be > pre-cleared lines
+         nop
Likewise.

+
+        ! initial cache-clearing stores
+        ! get store pipeline moving
+	rd	%asi, %g3		! save %asi to be restored later
+        wr     %g0, ASI_STBIMRU_P, %asi
Likewise.

+.wrzero_small:
+        stxa    %o1, [%o5]ASI_STBI_P
+        subcc   %o4, 64, %o4
+        bgu,pt  %ncc, .wrzero_small
+         add     %o5, 64, %o5
+	ba,a	.bsi_done
Likewise.

+.asi_done:
+	wr	%g3, 0x0, %asi		! restored saved %asi
+.bsi_done:
+        membar  #StoreStore             ! required by use of Block Store Init
Likewise.

+	.size		M7memset,.-M7memset
It's usually a lot better to use ENTRY() and ENDPROC() instead of
expanding these kinds of directives out.

Ok.  Sure. Will address it.
+	.globl	m7_patch_copyops
+	.type	m7_patch_copyops,#function
+m7_patch_copyops:
ENTRY()
Sure.
+	.size	m7_patch_copyops,.-m7_patch_copyops
ENDPROC()
Sure
+	.globl	m7_patch_bzero
+	.type	m7_patch_bzero,#function
+m7_patch_bzero:
Likewise.
Ok
+	.size	m7_patch_bzero,.-m7_patch_bzero
Likewise.
Ok
+	.globl	m7_patch_pageops
+	.type	m7_patch_pageops,#function
+m7_patch_pageops:
Likewise.
Ok
+	.size	m7_patch_pageops,.-m7_patch_pageops
Likewise.
ok
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html