On 2013-02-26 07:24, Kyungsik Lee wrote: > Hi, > > [...] > > Through the benchmark, it was found that -Os Compiler flag for > decompress.o brought better decompression performance in most of cases > (ex, different compiler and hardware spec.) in ARM architecture. > > Lastly, CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not always the best > option even though it is supported. The decompression speed can be > slightly slower in some cases. > > This patchset is based on 3.8. > > Any comments are appreciated. Did you actually *try* the new LZO version and the patch (which is attached once again) as explained in https://lkml.org/lkml/2013/2/3/367 ? Because the new LZO version is faster than LZ4 in my testing, at least when comparing apples with apples and enabling unaligned access in BOTH versions: armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size: compression speed decompression speed LZO-2012 : 44 MB/sec 117 MB/sec no unaligned access LZO-2013-UA : 47 MB/sec 167 MB/sec Unaligned Access LZ4 r88 UA : 46 MB/sec 154 MB/sec Unaligned Access ~Markus > > Thanks, > Kyungsik > > > Benchmark Results(PATCH v2) > Compiler: Linaro ARM gcc 4.6.2 > 1. ARMv7, 1.5GHz based board > Kernel: linux 3.4 > Uncompressed Kernel Size: 14MB > Compressed Size Decompression Speed > LZO 6.7MB 21.1MB/s > LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA) > 2. ARMv7, 1.7GHz based board > Kernel: linux 3.7 > Uncompressed Kernel Size: 14MB > Compressed Size Decompression Speed > LZO 6.0MB 34.1MB/s > LZ4 6.5MB 86.7MB/s > UA: Unaligned memory Access support > > > Change log: v2 > - Clean up code > - Enable unaligned access for ARM v6 and above with > CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > - Add lz4_decompress() for faster decompression with > uncompressed output size > - Use lz4_decompress() for LZ4-compressed kernel during > boot-process > - Apply -Os to decompress.o to improve decompress > performance during boot-up process > > > Kyungsik Lee (4): > decompressor: Add LZ4 decompressor module > lib: Add support for LZ4-compressed kernel > arm: Add support for LZ4-compressed kernel > x86: Add support for LZ4-compressed kernel > > arch/arm/Kconfig | 1 + > arch/arm/boot/compressed/.gitignore | 1 + > arch/arm/boot/compressed/Makefile | 6 +- > arch/arm/boot/compressed/decompress.c | 4 + > arch/arm/boot/compressed/piggy.lz4.S | 6 + > arch/x86/Kconfig | 1 + > arch/x86/boot/compressed/Makefile | 5 +- > arch/x86/boot/compressed/misc.c | 4 + > include/linux/decompress/unlz4.h | 10 + > include/linux/lz4.h | 48 +++++ > init/Kconfig | 13 +- > lib/Kconfig | 7 + > lib/Makefile | 2 + > lib/decompress.c | 5 + > lib/decompress_unlz4.c | 190 +++++++++++++++++++ > lib/lz4/Makefile | 1 + > lib/lz4/lz4_decompress.c | 331 ++++++++++++++++++++++++++++++++++ > lib/lz4/lz4defs.h | 93 ++++++++++ > scripts/Makefile.lib | 5 + > usr/Kconfig | 9 + > 20 files changed, 739 insertions(+), 3 deletions(-) > create mode 100644 arch/arm/boot/compressed/piggy.lz4.S > create mode 100644 include/linux/decompress/unlz4.h > create mode 100644 include/linux/lz4.h > create mode 100644 lib/decompress_unlz4.c > create mode 100644 lib/lz4/Makefile > create mode 100644 lib/lz4/lz4_decompress.c > create mode 100644 lib/lz4/lz4defs.h > -- Markus Oberhumer, <markus@xxxxxxxxxxxxx>, http://www.oberhumer.com/
commit 8745b927fcfcd6953ada9bd1220a73083db5948a Author: Markus F.X.J. Oberhumer <markus@xxxxxxxxxxxxx> Date: Mon Feb 4 02:26:14 2013 +0100 lib/lzo: huge LZO decompression speedup on ARM by using unaligned access Signed-off-by: Markus F.X.J. Oberhumer <markus@xxxxxxxxxxxxx> diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_safe.c index 569985d..e3edc5f 100644 --- a/lib/lzo/lzo1x_decompress_safe.c +++ b/lib/lzo/lzo1x_decompress_safe.c @@ -72,9 +72,11 @@ copy_literal_run: COPY8(op, ip); op += 8; ip += 8; +# if !defined(__arm__) COPY8(op, ip); op += 8; ip += 8; +# endif } while (ip < ie); ip = ie; op = oe; @@ -159,9 +161,11 @@ copy_literal_run: COPY8(op, m_pos); op += 8; m_pos += 8; +# if !defined(__arm__) COPY8(op, m_pos); op += 8; m_pos += 8; +# endif } while (op < oe); op = oe; if (HAVE_IP(6)) { diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h index 5a4beb2..b230601 100644 --- a/lib/lzo/lzodefs.h +++ b/lib/lzo/lzodefs.h @@ -12,8 +12,14 @@ */ +#if 1 && defined(__arm__) && ((__LINUX_ARM_ARCH__ >= 6) || defined(__ARM_FEATURE_UNALIGNED)) +#define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 1 +#define COPY4(dst, src) \ + * (u32 *) (void *) (dst) = * (const u32 *) (const void *) (src) +#else #define COPY4(dst, src) \ put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst)) +#endif #if defined(__x86_64__) #define COPY8(dst, src) \ put_unaligned(get_unaligned((const u64 *)(src)), (u64 *)(dst))