Hi, Sorry for the cross post and long email :-) Currently I am working on a very initial state build of Mandriva for arm. Thanks to Jeff Johnson for giving me ssh access to armv7 hosts, and Matthew Dawkins for building several Mandriva/Unity linux armv5 packages. What I am trying to understand now is about choice of float abi. I understand that the IHI0042D_aapcs.pdf file I donwload says to use vfp registers for float/double arguments, but softfp seems too good to miss, as armv5 should be around for some time yet. So, I have two chroots, running: softfp# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv7l-mandriva-linux-gnueabi/4.6.1/lto-wrapper Target: armv7l-mandriva-linux-gnueabi Configured with: /home/pcpa/bootstrap/rpmbuild/BUILD/gcc-4.6-20110722/configure --prefix=/usr --build=i586-mandriva-linux-gnu --host=armv7l-mandriva-linux-gnueabi --target=armv7l-mandriva-linux-gnueabi --enable-werror=no --enable-cxx --with-cpu=cortex-a8 --with-tune=cortex-a8 --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --enable-languages=c,c++ --enable-threads=posix --disable-libssp --disable-libmudflap Thread model: posix gcc version 4.6.1 20110722 (Mandriva) (GCC) thumb# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv7l-mandriva-linux-gnueabi/4.6.1/lto-wrapper Target: armv7l-mandriva-linux-gnueabi Configured with: /home/pcpa/bootstrap/rpmbuild/BUILD/gcc-4.6-20110722/configure --prefix=/usr --build=i586-mandriva-linux-gnu --host=armv7l-mandriva-linux-gnueabi --target=armv7l-mandriva-linux-gnueabi --enable-werror=no --enable-cxx --with-cpu=cortex-a8 --with-tune=cortex-a8 --with-arch=armv7-a --with-mode=thumb --with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --enable-languages=c,c++ --enable-threads=posix --disable-libssp --disable-libmudflap Thread model: posix gcc version 4.6.1 20110722 (Mandriva) (GCC) This is unmodified upstream gcc, and using a set of bootstrap scripts from a git branch I made on a checkout of git clone git://fedorapeople.org/~djdelorie/bootstrap.git Since I am still very "arm noob" :-) and just yesterday did the thumb build to learn about thumb, so far, my impression is that the best approach should be to use thumb+softfp. Just so you know I am running thumb and arm builds, with thumb using hard float and the softfp with arm instructions set: softfp# objdump -d /usr/lib/libm.so | less [...] 00008d30 <__ieee754_atan2>: 8d30: e3a0c000 mov ip, #0 8d34: e347cff0 movt ip, #32752 ; 0x7ff0 8d38: e92d4030 push {r4, r5, lr} 8d3c: ed2d8b10 vpush {d8-d15} 8d40: e3a05000 mov r5, #0 8d44: ec432b18 vmov d8, r2, r3 8d48: e3475ff0 movt r5, #32752 ; 0x7ff0 8d4c: e003c00c and ip, r3, ip 8d50: e15c0005 cmp ip, r5 8d54: e24dd02c sub sp, sp, #44 ; 0x2c 8d58: e1a04003 mov r4, r3 8d5c: ec410b19 vmov d9, r0, r1 8d60: e1a05002 mov r5, r2 8d64: 0a000022 beq 8df4 <__ieee754_atan2+0xc4> [...] thumb# objdump -d /usr/lib/libm.so | less [...] 00007884 <__ieee754_atan2>: 7884: 2100 movs r1, #0 7886: 2000 movs r0, #0 7888: f6c7 71f0 movt r1, #32752 ; 0x7ff0 788c: ec53 2b11 vmov r2, r3, d1 7890: f6c7 70f0 movt r0, #32752 ; 0x7ff0 7894: 4019 ands r1, r3 7896: 4281 cmp r1, r0 7898: e92d 03f0 stmdb sp!, {r4, r5, r6, r7, r8, r9} 789c: ed2d 8b10 vpush {d8-d15} 78a0: 461c mov r4, r3 78a2: b08a sub sp, #40 ; 0x28 78a4: eeb0 8b41 vmov.f64 d8, d1 78a8: 4616 mov r6, r2 78aa: eeb0 9b40 vmov.f64 d9, d0 78ae: d03c beq.n 792a <__ieee754_atan2+0xa6> [...] I am kind of trying to figure what "The Industry" says about it, and just checked the linaro gcc-4.6 relevant changes for me right now, that are... + --with-arch=armv7-a --with-tune=cortex-a8 \ + --with-float=$(float_abi) --with-fpu=neon \ +# check if we're building for armel or armhf +ifeq ($(DEB_TARGET_ARCH),armhf) + float_abi := hard +else ifneq (,$(filter $(DEB_TARGET_ARCH), arm armel)) + float_abi := softfp +endif If I understand correctly, neon will have better support for simd instructions right? Either way, I used two simple benchmarks to try to sell myself the idea of breaking compatibility with armv5 or older binaries, but still not convinced, but, as I said, we should use whatever "The Industry" chooses :-) I used for benchmark http://www.tux.org/~mayer/linux/bmark.html and http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Why-ARMs-EABI-matters/ and also compared with my home computer (quad)core i5 x86_64, and attached results... Thanks and again sorry for cross posting and long email, Paulo
x86_64$ ./nbench BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index : : Pentium 90* : AMD K6/233* --------------------:------------------:-------------:------------ NUMERIC SORT : 1240.2 : 31.81 : 10.45 STRING SORT : 736.92 : 329.28 : 50.97 BITFIELD : 5.4454e+08 : 93.41 : 19.51 FP EMULATION : 279.16 : 133.95 : 30.91 FOURIER : 33237 : 37.80 : 21.23 ASSIGNMENT : 40.455 : 153.94 : 39.93 IDEA : 8288 : 126.76 : 37.64 HUFFMAN : 2796.6 : 77.55 : 24.76 NEURAL NET : 71.531 : 114.91 : 48.34 LU DECOMPOSITION : 1989.8 : 103.08 : 74.44 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 110.274 FLOATING-POINT INDEX: 76.500 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : 4 CPU GenuineIntel Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz 2793MHz L2 Cache : 8192 KB OS : Linux 2.6.38.7-desktop-1mnb2 C compiler : gcc version 4.6.1 20110722 (Mandriva) (GCC) libc : libc-2.14.90.so MEMORY INDEX : 34.115 INTEGER INDEX : 23.422 FLOATING-POINT INDEX: 42.430 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder. thumb$ ./nbench BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index : : Pentium 90* : AMD K6/233* --------------------:------------------:-------------:------------ NUMERIC SORT : 537.6 : 13.79 : 4.53 STRING SORT : 61.711 : 27.57 : 4.27 BITFIELD : 1.3356e+08 : 22.91 : 4.79 FP EMULATION : 67.386 : 32.34 : 7.46 FOURIER : 6144.4 : 6.99 : 3.92 ASSIGNMENT : 7.4365 : 28.30 : 7.34 IDEA : 1544.6 : 23.62 : 7.01 HUFFMAN : 792.71 : 21.98 : 7.02 NEURAL NET : 8.3201 : 13.37 : 5.62 LU DECOMPOSITION : 290.72 : 15.06 : 10.88 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 23.650 FLOATING-POINT INDEX: 11.204 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : Dual L2 Cache : OS : Linux 2.6.38.8-32.01.fc13.armv7l.omap C compiler : gcc version 4.6.1 20110722 (Mandriva) (GCC) libc : libc-2.14.90.so MEMORY INDEX : 5.312 INTEGER INDEX : 6.386 FLOATING-POINT INDEX: 6.214 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder. softfp$ ./nbench BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index : : Pentium 90* : AMD K6/233* --------------------:------------------:-------------:------------ NUMERIC SORT : 521.4 : 13.37 : 4.39 STRING SORT : 62.71 : 28.02 : 4.34 BITFIELD : 1.9979e+08 : 34.27 : 7.16 FP EMULATION : 84.446 : 40.52 : 9.35 FOURIER : 6379.4 : 7.26 : 4.08 ASSIGNMENT : 7.4291 : 28.27 : 7.33 IDEA : 1256.5 : 19.22 : 5.71 HUFFMAN : 874.75 : 24.26 : 7.75 NEURAL NET : 9.1634 : 14.72 : 6.19 LU DECOMPOSITION : 274.57 : 14.22 : 10.27 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 25.419 FLOATING-POINT INDEX: 11.495 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : Dual L2 Cache : OS : Linux 2.6.38.8-32.01.fc13.armv7l.omap C compiler : gcc version 4.6.1 20110722 (Mandriva) (GCC) libc : libc-2.14.90.so MEMORY INDEX : 6.106 INTEGER INDEX : 6.527 FLOATING-POINT INDEX: 6.376 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder.
x86_64# ./bench nTimes=93750 16: Dot with C code => (flops 5154.639160 : time:0.000582 us) nTimes=61225 16: Distance with C code => (flops 1670.392456 : time:0.001796 us) nTimes=46875 32: Dot with C code => (flops 5347.593262 : time:0.000561 us) nTimes=30928 32: Distance with C code => (flops 1625.144043 : time:0.001846 us) nTimes=23438 64: Dot with C code => (flops 5217.502930 : time:0.000575 us) nTimes=15545 64: Distance with C code => (flops 1446.569458 : time:0.002074 us) nTimes=11719 128: Dot with C code => (flops 5008.454102 : time:0.000599 us) nTimes=7793 128: Distance with C code => (flops 1433.494873 : time:0.002093 us) nTimes=5860 256: Dot with C code => (flops 4688.000000 : time:0.00064 us) nTimes=3902 256: Distance with C code => (flops 1455.913574 : time:0.002061 us) nTimes=2930 512: Dot with C code => (flops 4464.761719 : time:0.000672 us) nTimes=1952 512: Distance with C code => (flops 1446.588257 : time:0.002074 us) nTimes=1465 1024: Dot with C code => (flops 4317.007324 : time:0.000695 us) nTimes=977 1024: Distance with C code => (flops 1389.320312 : time:0.002161 us) nTimes=733 2048: Dot with C code => (flops 4240.632812 : time:0.000708 us) nTimes=489 2048: Distance with C code => (flops 1412.079468 : time:0.002128 us) nTimes=367 4096: Dot with C code => (flops 4234.456543 : time:0.00071 us) nTimes=245 4096: Distance with C code => (flops 1440.576538 : time:0.00209 us) nTimes=184 8192: Dot with C code => (flops 4222.207031 : time:0.000714 us) nTimes=123 8192: Distance with C code => (flops 1438.824829 : time:0.002101 us) nTimes=92 16384: Dot with C code => (flops 4251.982910 : time:0.000709 us) nTimes=62 16384: Distance with C code => (flops 1437.493408 : time:0.00212 us) nTimes=46 32768: Dot with C code => (flops 4210.413574 : time:0.000716 us) nTimes=31 32768: Distance with C code => (flops 1384.577393 : time:0.002201 us) nTimes=23 65536: Dot with C code => (flops 4198.685059 : time:0.000718 us) nTimes=16 65536: Distance with C code => (flops 1453.671021 : time:0.002164 us) 16, 5154.639160, 1670.392456, 32, 5347.593262, 1625.144043, 64, 5217.502930, 1446.569458, 128, 5008.454102, 1433.494873, 256, 4688.000000, 1455.913574, 512, 4464.761719, 1446.588257, 1024, 4317.007324, 1389.320312, 2048, 4240.632812, 1412.079468, 4096, 4234.456543, 1440.576538, 8192, 4222.207031, 1438.824829, 16384, 4251.982910, 1437.493408, 32768, 4210.413574, 1384.577393, 65536, 4198.685059, 1453.671021, x86_64# ./cfft nTimes=6250 N=16: (flops 1865.671631 : time:0.001072 us) nTimes=2500 N=32: (flops 1932.367188 : time:0.001035 us) nTimes=1042 N=64: (flops 1830.411621 : time:0.001093 us) nTimes=447 N=128: (flops 1994.581787 : time:0.001004 us) nTimes=196 N=256: (flops 1933.564575 : time:0.001038 us) nTimes=87 N=512: (flops 2088.000000 : time:0.00096 us) nTimes=40 N=1024: (flops 2140.020752 : time:0.000957 us) nTimes=18 N=2048: (flops 2054.225098 : time:0.000987 us) nTimes=9 N=4096: (flops 2036.685059 : time:0.001086 us) nTimes=4 N=8192: (flops 1883.218384 : time:0.001131 us) nTimes=2 N=16384: (flops 1940.575195 : time:0.001182 us) nTimes=1 N=32768: (flops 1842.278809 : time:0.001334 us) nTimes=1 N=65536: (flops 1937.501831 : time:0.002706 us) 16, 1865.671631 32, 1932.367188 64, 1830.411621 128, 1994.581787 256, 1933.564575 512, 2088.000000 1024, 2140.020752 2048, 2054.225098 4096, 2036.685059 8192, 1883.218384 16384, 1940.575195 32768, 1842.278809 65536, 1937.501831 thumb# ./bench nTimes=93750 16: Dot with C code => (flops 630.119690 : time:0.004761 us) nTimes=61225 16: Distance with C code => (flops 300.603699 : time:0.00998 us) nTimes=46875 32: Dot with C code => (flops 646.691040 : time:0.004639 us) nTimes=30928 32: Distance with C code => (flops 310.111237 : time:0.009674 us) nTimes=23438 64: Dot with C code => (flops 702.262207 : time:0.004272 us) nTimes=15545 64: Distance with C code => (flops 308.185394 : time:0.009735 us) nTimes=11719 128: Dot with C code => (flops 750.391174 : time:0.003998 us) nTimes=7793 128: Distance with C code => (flops 311.105896 : time:0.009644 us) nTimes=5860 256: Dot with C code => (flops 756.319641 : time:0.003967 us) nTimes=3902 256: Distance with C code => (flops 313.120941 : time:0.009583 us) nTimes=2930 512: Dot with C code => (flops 707.288940 : time:0.004242 us) nTimes=1952 512: Distance with C code => (flops 313.077728 : time:0.009583 us) nTimes=1465 1024: Dot with C code => (flops 707.122314 : time:0.004243 us) nTimes=977 1024: Distance with C code => (flops 313.296570 : time:0.009583 us) nTimes=733 2048: Dot with C code => (flops 723.287842 : time:0.004151 us) nTimes=489 2048: Distance with C code => (flops 310.616608 : time:0.009674 us) nTimes=367 4096: Dot with C code => (flops 757.868347 : time:0.003967 us) nTimes=245 4096: Distance with C code => (flops 312.194611 : time:0.009644 us) nTimes=184 8192: Dot with C code => (flops 742.709045 : time:0.004059 us) nTimes=123 8192: Distance with C code => (flops 309.540314 : time:0.009766 us) nTimes=92 16384: Dot with C code => (flops 457.320374 : time:0.006592 us) nTimes=62 16384: Distance with C code => (flops 284.492737 : time:0.010712 us) nTimes=46 32768: Dot with C code => (flops 470.378540 : time:0.006409 us) nTimes=31 32768: Distance with C code => (flops 280.483643 : time:0.010865 us) nTimes=23 65536: Dot with C code => (flops 415.070343 : time:0.007263 us) nTimes=16 65536: Distance with C code => (flops 278.581635 : time:0.011292 us) 16, 630.119690, 300.603699, 32, 646.691040, 310.111237, 64, 702.262207, 308.185394, 128, 750.391174, 311.105896, 256, 756.319641, 313.120941, 512, 707.288940, 313.077728, 1024, 707.122314, 313.296570, 2048, 723.287842, 310.616608, 4096, 757.868347, 312.194611, 8192, 742.709045, 309.540314, 16384, 457.320374, 284.492737, 32768, 470.378540, 280.483643, 65536, 415.070343, 278.581635, thumb# ./cfft nTimes=6250 N=16: (flops 229.937912 : time:0.008698 us) nTimes=2500 N=32: (flops 248.200531 : time:0.008058 us) nTimes=1042 N=64: (flops 259.116699 : time:0.007721 us) nTimes=447 N=128: (flops 255.298325 : time:0.007844 us) nTimes=196 N=256: (flops 262.015656 : time:0.00766 us) nTimes=87 N=512: (flops 262.744781 : time:0.007629 us) nTimes=40 N=1024: (flops 263.171417 : time:0.007782 us) nTimes=18 N=2048: (flops 259.539154 : time:0.007812 us) nTimes=9 N=4096: (flops 229.348816 : time:0.009644 us) nTimes=4 N=8192: (flops 202.290817 : time:0.010529 us) nTimes=2 N=16384: (flops 200.415909 : time:0.011445 us) nTimes=1 N=32768: (flops 189.922714 : time:0.01294 us) nTimes=1 N=65536: (flops 169.584686 : time:0.030916 us) 16, 229.937912 32, 248.200531 64, 259.116699 128, 255.298325 256, 262.015656 512, 262.744781 1024, 263.171417 2048, 259.539154 4096, 229.348816 8192, 202.290817 16384, 200.415909 32768, 189.922714 65536, 169.584686 softfp# ./bench nTimes=93750 16: Dot with C code => (flops 626.174072 : time:0.004791 us) nTimes=61225 16: Distance with C code => (flops 270.809235 : time:0.011078 us) nTimes=46875 32: Dot with C code => (flops 655.451111 : time:0.004577 us) nTimes=30928 32: Distance with C code => (flops 290.840118 : time:0.010315 us) nTimes=23438 64: Dot with C code => (flops 682.608398 : time:0.004395 us) nTimes=15545 64: Distance with C code => (flops 302.468506 : time:0.009919 us) nTimes=11719 128: Dot with C code => (flops 717.546997 : time:0.004181 us) nTimes=7793 128: Distance with C code => (flops 305.312408 : time:0.009827 us) nTimes=5860 256: Dot with C code => (flops 756.129028 : time:0.003968 us) nTimes=3902 256: Distance with C code => (flops 309.184753 : time:0.009705 us) nTimes=2930 512: Dot with C code => (flops 756.319641 : time:0.003967 us) nTimes=1952 512: Distance with C code => (flops 313.077728 : time:0.009583 us) nTimes=1465 1024: Dot with C code => (flops 697.262390 : time:0.004303 us) nTimes=977 1024: Distance with C code => (flops 316.333466 : time:0.009491 us) nTimes=733 2048: Dot with C code => (flops 687.827698 : time:0.004365 us) nTimes=489 2048: Distance with C code => (flops 311.582855 : time:0.009644 us) nTimes=367 4096: Dot with C code => (flops 719.077759 : time:0.004181 us) nTimes=245 4096: Distance with C code => (flops 309.244568 : time:0.009736 us) nTimes=184 8192: Dot with C code => (flops 567.624878 : time:0.005311 us) nTimes=123 8192: Distance with C code => (flops 293.037109 : time:0.010316 us) nTimes=92 16384: Dot with C code => (flops 459.411133 : time:0.006562 us) nTimes=62 16384: Distance with C code => (flops 283.671814 : time:0.010743 us) nTimes=46 32768: Dot with C code => (flops 440.932587 : time:0.006837 us) nTimes=31 32768: Distance with C code => (flops 282.878937 : time:0.010773 us) nTimes=23 65536: Dot with C code => (flops 403.190582 : time:0.007477 us) nTimes=16 65536: Distance with C code => (flops 279.323730 : time:0.011262 us) 16, 626.174072, 270.809235, 32, 655.451111, 290.840118, 64, 682.608398, 302.468506, 128, 717.546997, 305.312408, 256, 756.129028, 309.184753, 512, 756.319641, 313.077728, 1024, 697.262390, 316.333466, 2048, 687.827698, 311.582855, 4096, 719.077759, 309.244568, 8192, 567.624878, 293.037109, 16384, 459.411133, 283.671814, 32768, 440.932587, 282.878937, 65536, 403.190582, 279.323730, softfp# ./cfft nTimes=6250 N=16: (flops 249.190125 : time:0.008026 us) nTimes=2500 N=32: (flops 260.044220 : time:0.007691 us) nTimes=1042 N=64: (flops 265.407257 : time:0.007538 us) nTimes=447 N=128: (flops 268.944427 : time:0.007446 us) nTimes=196 N=256: (flops 270.636444 : time:0.007416 us) nTimes=87 N=512: (flops 272.532959 : time:0.007355 us) nTimes=40 N=1024: (flops 272.775726 : time:0.007508 us) nTimes=18 N=2048: (flops 261.547974 : time:0.007752 us) nTimes=9 N=4096: (flops 229.348816 : time:0.009644 us) nTimes=4 N=8192: (flops 210.217133 : time:0.010132 us) nTimes=2 N=16384: (flops 197.261780 : time:0.011628 us) nTimes=1 N=32768: (flops 190.364059 : time:0.01291 us) nTimes=1 N=65536: (flops 170.428116 : time:0.030763 us) 16, 249.190125 32, 260.044220 64, 265.407257 128, 268.944427 256, 270.636444 512, 272.532959 1024, 272.775726 2048, 261.547974 4096, 229.348816 8192, 210.217133 16384, 197.261780 32768, 190.364059 65536, 170.428116
_______________________________________________ arm mailing list arm@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/arm