> On Aug 21, 2015, at 14:41, Tomi Valkeinen <tomi.valkeinen@xxxxxx> wrote: > > > > On 20/08/15 14:30, yalin wang wrote: >> >>> On Aug 20, 2015, at 19:02, Tomi Valkeinen <tomi.valkeinen@xxxxxx> wrote: >>> >>> >>> On 10/08/15 13:12, yalin wang wrote: >>>> This change to use swab32(bitrev32()) to implement reverse_order() >>>> function, have better performance on some platforms. >>> >>> Which platforms? Presuming you tested this, roughly how much better >>> performance? If you didn't, how do you know it's faster? >> >> i investigate on arm64 platforms: > > Ok. So is any arm64 platform actually using these devices? If these > devices are mostly used by 32bit x86 platforms, optimizing them for > arm64 doesn't make any sense. > > Possibly the patches are still good for x86 also, but that needs to be > proven. > not exactly, because x86_64 don’t have hardware instruction to do rbit OP, i compile by test : use the patch: use swab32(bitrev32()): 2775: 0f b6 d0 movzbl %al,%edx 2778: 0f b6 c4 movzbl %ah,%eax 277b: 0f b6 92 00 00 00 00 movzbl 0x0(%rdx),%edx 2782: 0f b6 80 00 00 00 00 movzbl 0x0(%rax),%eax 2789: c1 e2 08 shl $0x8,%edx 278c: 09 d0 or %edx,%eax 278e: 0f b6 d5 movzbl %ch,%edx 2791: 0f b6 c9 movzbl %cl,%ecx 2794: 0f b6 89 00 00 00 00 movzbl 0x0(%rcx),%ecx 279b: 0f b6 92 00 00 00 00 movzbl 0x0(%rdx),%edx 27a2: 0f b7 c0 movzwl %ax,%eax 27a5: c1 e1 08 shl $0x8,%ecx 27a8: 09 ca or %ecx,%edx 27aa: c1 e2 10 shl $0x10,%edx 27ad: 09 d0 or %edx,%eax 27af: 45 85 ff test %r15d,%r15d 27b2: 0f c8 bswap %eax 4 memory access instructions, without the patch: use do { \ - u8 *a = (u8 *)(l); \ - a[0] = bitrev8(a[0]); \ - a[1] = bitrev8(a[1]); \ - a[2] = bitrev8(a[2]); \ - a[3] = bitrev8(a[3]); \ -} while(0) 277b: 45 0f b6 80 00 00 00 movzbl 0x0(%r8),%r8d 2782: 00 2783: c1 ee 10 shr $0x10,%esi 2786: 89 f2 mov %esi,%edx 2788: 0f b6 f4 movzbl %ah,%esi 278b: c1 e8 18 shr $0x18,%eax 278e: 0f b6 d2 movzbl %dl,%edx 2791: 48 98 cltq 2793: 45 85 ed test %r13d,%r13d 2796: 0f b6 92 00 00 00 00 movzbl 0x0(%rdx),%edx 279d: 0f b6 80 00 00 00 00 movzbl 0x0(%rax),%eax 27a4: 44 88 85 54 ff ff ff mov %r8b,-0xac(%rbp) 27ab: 44 0f b6 86 00 00 00 movzbl 0x0(%rsi),%r8d 27b2: 00 27b3: 88 95 56 ff ff ff mov %dl,-0xaa(%rbp) 27b9: 88 85 57 ff ff ff mov %al,-0xa9(%rbp) 27bf: 44 88 85 55 ff ff ff mov %r8b,-0xab(%rbp) 6 memory access instructions, and generate more code that the patch . because the original code use byte access 4 times , i don’t think have better performance. :) Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html