From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> For some reason GCC 6.2.1 here unrolls the from and to stack memcpy here in per-byte fashion and also by repeatedly loading offset constants. It look horrible like this for example: ... fdc: 48 b8 41 00 00 00 00 movabs rax,0xffff880000000041 fe3: 88 ff ff fe6: 44 88 74 06 80 mov BYTE PTR [rsi+rax*1-0x80],r14b feb: 48 b8 42 00 00 00 00 movabs rax,0xffff880000000042 ff2: 88 ff ff ff5: 44 88 6c 06 80 mov BYTE PTR [rsi+rax*1-0x80],r13b ffa: 48 b8 43 00 00 00 00 movabs rax,0xffff880000000043 1001: 88 ff ff 1004: 44 88 64 06 80 mov BYTE PTR [rsi+rax*1-0x80],r12b 1009: 48 b8 44 00 00 00 00 movabs rax,0xffff880000000044 1010: 88 ff ff 1013: 88 5c 06 80 mov BYTE PTR [rsi+rax*1-0x80],bl 1017: 48 b8 45 00 00 00 00 movabs rax,0xffff880000000045 101e: 88 ff ff 1021: 44 88 5c 06 80 mov BYTE PTR [rsi+rax*1-0x80],r11b 1026: 48 b8 46 00 00 00 00 movabs rax,0xffff880000000046 102d: 88 ff ff 1030: 44 88 54 06 80 mov BYTE PTR [rsi+rax*1-0x80],r10b 1035: 48 b8 47 00 00 00 00 movabs rax,0xffff880000000047 103c: 88 ff ff 103f: 44 88 4c 06 80 mov BYTE PTR [rsi+rax*1-0x80],r9b 1044: 0f b6 5d d0 movzx ebx,BYTE PTR [rbp-0x30] 1048: 48 b8 48 00 00 00 00 movabs rax,0xffff880000000048 104f: 88 ff ff 1052: 88 5c 06 80 mov BYTE PTR [rsi+rax*1-0x80],bl 1056: 48 b8 49 00 00 00 00 movabs rax,0xffff880000000049 105d: 88 ff ff 1060: 40 88 7c 06 80 mov BYTE PTR [rsi+rax*1-0x80],dil 1065: 0f b6 5d cf movzx ebx,BYTE PTR [rbp-0x31] 1069: 48 b8 4a 00 00 00 00 movabs rax,0xffff88000000004a 1070: 88 ff ff 1073: 88 5c 06 80 mov BYTE PTR [rsi+rax*1-0x80],bl 1077: 0f b6 7d ce movzx edi,BYTE PTR [rbp-0x32] 107b: 48 b8 4b 00 00 00 00 movabs rax,0xffff88000000004b ... So change the code a bit which makes it generate a more reasonable code like: ... bf1: 48 89 78 b8 mov QWORD PTR [rax-0x48],rdi bf5: 4c 89 60 c0 mov QWORD PTR [rax-0x40],r12 bf9: 48 89 58 c8 mov QWORD PTR [rax-0x38],rbx bfd: 4c 89 58 d0 mov QWORD PTR [rax-0x30],r11 c01: 4c 89 50 d8 mov QWORD PTR [rax-0x28],r10 ... Which saves 2087 bytes of code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> --- drivers/gpu/drm/i915/i915_gem_fence_reg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c index e03983973252..d665d2e74641 100644 --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page) vaddr = kmap(page); for (i = 0; i < PAGE_SIZE; i += 128) { - memcpy(temp, &vaddr[i], 64); + memcpy(&temp[0], &vaddr[i], 64); memcpy(&vaddr[i], &vaddr[i + 64], 64); - memcpy(&vaddr[i + 64], temp, 64); + memcpy(&vaddr[i + 64], &temp[0], 64); } kunmap(page); -- 2.7.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx