Why is the performance of 32bit program worse than 64bit program running on the same 64bit system, They are compiled from same source. Which gcc option can fix it?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys:
   What does the number of stalled cycles in the CPU pipeline frontend
means? Why is the stalled frontend cycles of 32bit program more than
64bit program's stalled cycles when they running on same 64bit system?
Is there any gcc options to fix it?

linux-jjhr:/mnt/sda3/home/sean/suse_lab/test_32_64 # gcc -Wall test.c -o
test64
linux-jjhr:/mnt/sda3/home/sean/suse_lab/test_32_64 # gcc -Wall test.c -o
test32 -m32

linux-jjhr:/mnt/sda3/home/sean/suse_lab/test_32_64 # perf stat ./test64
1000000000

 Performance counter stats for './test64 1000000000':

      24650.018596 task-clock                #    0.999 CPUs
utilized
             2,100 context-switches          #    0.000
M/sec
                 3 CPU-migrations            #    0.000
M/sec
               135 page-faults               #    0.000
M/sec
    71,966,342,812 cycles                    #    2.920 GHz
[83.33%]
     6,369,556,234 stalled-cycles-frontend   #    8.85% frontend cycles
idle    [83.33%]
     1,699,050,991 stalled-cycles-backend    #    2.36% backend  cycles
idle    [66.67%]
   156,985,267,463 instructions              #    2.18  insns per
cycle
                                             #    0.04  stalled cycles
per insn [83.33%]
    35,472,160,125 branches                  # 1439.032 M/sec
[83.33%]
         2,436,028 branch-misses             #    0.01% of all branches
[83.35%]

      24.674703793 seconds time elapsed

linux-jjhr:/mnt/sda3/home/sean/suse_lab/test_32_64 # perf stat ./test32
1000000000

 Performance counter stats for './test32 1000000000':

      54676.882729 task-clock                #    0.999 CPUs
utilized
             4,657 context-switches          #    0.000
M/sec
                 7 CPU-migrations            #    0.000
M/sec
               116 page-faults               #    0.000
M/sec
   159,670,693,964 cycles                    #    2.920 GHz
[83.33%]
    71,123,035,082 stalled-cycles-frontend   #   44.54% frontend cycles
idle    [83.34%]
     7,119,090,236 stalled-cycles-backend    #    4.46% backend  cycles
idle    [66.66%]
   204,576,003,586 instructions              #    1.28  insns per
cycle
                                             #    0.35  stalled cycles
per insn [83.33%]
    39,748,525,691 branches                  #  726.971 M/sec
[83.34%]
         4,300,876 branch-misses             #    0.01% of all branches
[83.33%]

      54.731504570 seconds time elapsed




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux