Almost certainly still faster than bytewise copy! "Yinghai Lu" <yinghai@xxxxxxxxxx> wrote: >On 10/07/2010 10:40 PM, tip-bot for Zhao Yakui wrote: >> Commit-ID: 68f4d5a00adaab33b136fce2c72d5c377b39b0b0 >> Gitweb: http://git.kernel.org/tip/68f4d5a00adaab33b136fce2c72d5c377b39b0b0 >> Author: Zhao Yakui <yakui.zhao@xxxxxxxxx> >> AuthorDate: Fri, 8 Oct 2010 09:47:33 +0800 >> Committer: H. Peter Anvin <hpa@xxxxxxxxx> >> CommitDate: Thu, 7 Oct 2010 21:23:09 -0700 >> >> x86, setup: Use string copy operation to optimze copy in kernel compression >> >> The kernel decompression code parses the ELF header and then copies >> the segment to the corresponding destination. Currently it uses slow >> byte-copy code. This patch makes it use the string copy operations >> instead. >> >> In the test the copy performance can be improved very significantly after using >> the string copy operation mechanism. >> 1. The copy time can be reduced from 150ms to 20ms on one Atom machine >> 2. The copy time can be reduced about 80% on another machine >> The time is reduced from 7ms to 1.5ms when using 32-bit kernel. >> The time is reduced from 10ms to 2ms when using 64-bit kernel. >> >> Signed-off-by: Zhao Yakui <yakui.zhao@xxxxxxxxx> >> LKML-Reference: <1286502453-7043-1-git-send-email-yakui.zhao@xxxxxxxxx> >> Signed-off-by: H. Peter Anvin <hpa@xxxxxxxxx> >> --- >> arch/x86/boot/compressed/misc.c | 29 +++++++++++++++++++++++------ >> 1 files changed, 23 insertions(+), 6 deletions(-) >> >> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c >> index 8f7bef8..23f315c 100644 >> --- a/arch/x86/boot/compressed/misc.c >> +++ b/arch/x86/boot/compressed/misc.c >> @@ -229,18 +229,35 @@ void *memset(void *s, int c, size_t n) >> ss[i] = c; >> return s; >> } >> - >> +#ifdef CONFIG_X86_32 >> void *memcpy(void *dest, const void *src, size_t n) >> { >> - int i; >> - const char *s = src; >> - char *d = dest; >> + int d0, d1, d2; >> + asm volatile( >> + "rep ; movsl\n\t" >> + "movl %4,%%ecx\n\t" >> + "rep ; movsb\n\t" >> + : "=&c" (d0), "=&D" (d1), "=&S" (d2) >> + : "0" (n >> 2), "g" (n & 3), "1" (dest), "2" (src) >> + : "memory"); >> >> - for (i = 0; i < n; i++) >> - d[i] = s[i]; >> return dest; >> } >> +#else >> +void *memcpy(void *dest, const void *src, size_t n) >> +{ >> + long d0, d1, d2; >> + asm volatile( >> + "rep ; movsq\n\t" >> + "movq %4,%%rcx\n\t" >> + "rep ; movsb\n\t" >> + : "=&c" (d0), "=&D" (d1), "=&S" (d2) >> + : "0" (n >> 3), "g" (n & 7), "1" (dest), "2" (src) >> + : "memory"); >> >> + return dest; >> +} >> +#endif >> >> static void error(char *x) >> { > >wonder if it would have problem with some old AMD K8 systems. > >in amd.c > > /* On C+ stepping K8 rep microcode works well for copy/memset */ > if (c->x86 == 0xf) { > u32 level; > > level = cpuid_eax(1); > if ((level >= 0x0f48 && level < 0x0f50) || level >= 0x0f58) > set_cpu_cap(c, X86_FEATURE_REP_GOOD); >... > > } > if (c->x86 >= 0x10) > set_cpu_cap(c, X86_FEATURE_REP_GOOD); > >Yinghai -- Sent from my mobile phone. Please pardon any lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |