[to-be-updated] lib-lzo-clean-up-by-introducing-copy16.patch removed from -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: lib/lzo: clean-up by introducing COPY16
has been removed from the -mm tree.  Its filename was
     lib-lzo-clean-up-by-introducing-copy16.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: Matt Sealey <matt.sealey@xxxxxxx>
Subject: lib/lzo: clean-up by introducing COPY16

Most compilers should be able to merge adjacent loads/stores of sizes
which are less than but effect a multiple of a machine word size (in
effect a memcpy() of a constant amount). However the semantics of the
macro are that it just does the copy, the pointer increment is in the
code, hence we see

    *a = *b
    a += 8
    b += 8
    *a = *b
    a += 8
    b += 8

This introduces a dependency between the two groups of statements which
seems to defeat said compiler optimizers and generate some very strange
sequences of addition and subtraction of address offsets (i.e. it is
overcomplicated).

Since COPY8 is only ever used to copy amounts of 16 bytes (in pairs),
just define COPY16 as COPY8,COPY8. We leave the definition to preserve
the need to do unaligned accesses to machine-sized words per the
original code intent, we just don't use it in the code proper.

COPY16 then gives us code like:

    *a = *b
    *(a+8) = *(b+8)
    a += 16
    b += 16

This seems to allow compilers to generate much better code by using
base register writeback or simply positively incrementing offsets which
seems to positively affect performance. It is, at least, fewer
instructions to do the same job.

Link: http://lkml.kernel.org/r/20181127161913.23863-3-dave.rodgman@xxxxxxx
Signed-off-by: Matt Sealey <matt.sealey@xxxxxxx>
Signed-off-by: Dave Rodgman <dave.rodgman@xxxxxxx>
Cc: David S. Miller <davem@xxxxxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Cc: Markus F.X.J. Oberhumer <markus@xxxxxxxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Nitin Gupta <nitingupta910@xxxxxxxxx>
Cc: Richard Purdie <rpurdie@xxxxxxxxxxxxxx>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx>
Cc: Sonny Rao <sonnyrao@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 lib/lzo/lzo1x_compress.c        |    9 +++------
 lib/lzo/lzo1x_decompress_safe.c |   18 ++++++------------
 lib/lzo/lzodefs.h               |    3 +++
 3 files changed, 12 insertions(+), 18 deletions(-)

--- a/lib/lzo/lzo1x_compress.c~lib-lzo-clean-up-by-introducing-copy16
+++ a/lib/lzo/lzo1x_compress.c
@@ -60,8 +60,7 @@ next:
 				op += t;
 			} else if (t <= 16) {
 				*op++ = (t - 3);
-				COPY8(op, ii);
-				COPY8(op + 8, ii + 8);
+				COPY16(op, ii);
 				op += t;
 			} else {
 				if (t <= 18) {
@@ -76,8 +75,7 @@ next:
 					*op++ = tt;
 				}
 				do {
-					COPY8(op, ii);
-					COPY8(op + 8, ii + 8);
+					COPY16(op, ii);
 					op += 16;
 					ii += 16;
 					t -= 16;
@@ -255,8 +253,7 @@ int lzo1x_1_compress(const unsigned char
 			*op++ = tt;
 		}
 		if (t >= 16) do {
-			COPY8(op, ii);
-			COPY8(op + 8, ii + 8);
+			COPY16(op, ii);
 			op += 16;
 			ii += 16;
 			t -= 16;
--- a/lib/lzo/lzo1x_decompress_safe.c~lib-lzo-clean-up-by-introducing-copy16
+++ a/lib/lzo/lzo1x_decompress_safe.c
@@ -86,12 +86,9 @@ copy_literal_run:
 					const unsigned char *ie = ip + t;
 					unsigned char *oe = op + t;
 					do {
-						COPY8(op, ip);
-						op += 8;
-						ip += 8;
-						COPY8(op, ip);
-						op += 8;
-						ip += 8;
+						COPY16(op, ip);
+						op += 16;
+						ip += 16;
 					} while (ip < ie);
 					ip = ie;
 					op = oe;
@@ -187,12 +184,9 @@ copy_literal_run:
 			unsigned char *oe = op + t;
 			if (likely(HAVE_OP(t + 15))) {
 				do {
-					COPY8(op, m_pos);
-					op += 8;
-					m_pos += 8;
-					COPY8(op, m_pos);
-					op += 8;
-					m_pos += 8;
+					COPY16(op, m_pos);
+					op += 16;
+					m_pos += 16;
 				} while (op < oe);
 				op = oe;
 				if (HAVE_IP(6)) {
--- a/lib/lzo/lzodefs.h~lib-lzo-clean-up-by-introducing-copy16
+++ a/lib/lzo/lzodefs.h
@@ -23,6 +23,9 @@
 		COPY4(dst, src); COPY4((dst) + 4, (src) + 4)
 #endif
 
+#define COPY16(dst, src) \
+	do { COPY8(dst, src); COPY8((dst) + 8, (src) + 8); } while (0)
+
 #if defined(__BIG_ENDIAN) && defined(__LITTLE_ENDIAN)
 #error "conflicting endian definitions"
 #elif defined(CONFIG_X86_64)
_

Patches currently in -mm which might be from matt.sealey@xxxxxxx are

lib-lzo-enable-64-bit-ctz-on-arm.patch
lib-lzo-64-bit-ctz-on-arm64.patch
lib-lzo-fast-8-byte-copy-on-arm64.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux