Perf and Hackbench results on my machine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Vlastimil.

On Mon, Oct 11, 2021 at 09:21:01AM +0200, Vlastimil Babka wrote:
> On 10/11/21 00:49, David Rientjes wrote:
> > On Fri, 8 Oct 2021, Hyeonggon Yoo wrote:
> > 
> >> It's certain that an object will be not only read, but also
> >> written after allocation.
> >> 
> > 
> > Why is it certain?  I think perhaps what you meant to say is that if we 
> > are doing any prefetching here, then access will benefit from prefetchw 
> > instead of prefetch.  But it's not "certain" that allocated memory will be 
> > accessed at all.
> 
> I think the primary reason there's a prefetch is freelist traversal. The
> cacheline we prefetch will be read during the next allocation, so if we
> expect there to be one soon, prefetch might help.

I agree that.

> That the freepointer is
> part of object itself and thus the cache line will be probably accessed also
> after the allocation, is secondary.

Right. it depends on cache line size and whether first cache line of an
object is frequently accessed or not.

> Yeah this might help some workloads, but
> perhaps hurt others - these things might look obvious in theory but be
> rather unpredictable in practice. At least some hackbench results would help...
>

Below is my measurement. it seems prefetch(w) is not making things worse
at least on hackbench.

Measured on 16 CPUs (ARM64) / 16G RAM
Without prefetch:

Time: 91.989
 Performance counter stats for 'hackbench -g 100 -l 10000':
        1467926.03 msec cpu-clock                 #   15.907 CPUs utilized          
          17782076      context-switches          #   12.114 K/sec                  
            957523      cpu-migrations            #  652.296 /sec                   
            104561      page-faults               #   71.230 /sec                   
     1622117569931      cycles                    #    1.105 GHz                      (54.54%)
     2002981132267      instructions              #    1.23  insn per cycle           (54.32%)
        5600876429      branch-misses                                                 (54.28%)
      642657442307      cache-references          #  437.800 M/sec                    (54.27%)
       19404890844      cache-misses              #    3.019 % of all cache refs      (54.28%)
      640413686039      L1-dcache-loads           #  436.271 M/sec                    (46.85%)
       19110650580      L1-dcache-load-misses     #    2.98% of all L1-dcache accesses  (46.83%)
      651556334841      dTLB-loads                #  443.862 M/sec                    (46.63%)
        3193647402      dTLB-load-misses          #    0.49% of all dTLB cache accesses  (46.84%)
      538927659684      iTLB-loads                #  367.135 M/sec                    (54.31%)
         118503839      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.35%)
      625750168840      L1-icache-loads           #  426.282 M/sec                    (46.80%)
       24348083282      L1-icache-load-misses     #    3.89% of all L1-icache accesses  (46.78%)

      92.284351157 seconds time elapsed

      44.524693000 seconds user
    1426.214006000 seconds sys

With prefetch:

Time: 91.677

 Performance counter stats for 'hackbench -g 100 -l 10000':
        1462938.07 msec cpu-clock                 #   15.908 CPUs utilized          
          18072550      context-switches          #   12.354 K/sec                  
           1018814      cpu-migrations            #  696.416 /sec                   
            104558      page-faults               #   71.471 /sec                   
     2003670016013      instructions              #    1.27  insn per cycle           (54.31%)
        5702204863      branch-misses                                                 (54.28%)
      643368500985      cache-references          #  439.778 M/sec                    (54.26%)
       18475582235      cache-misses              #    2.872 % of all cache refs      (54.28%)
      642206796636      L1-dcache-loads           #  438.984 M/sec                    (46.87%)
       18215813147      L1-dcache-load-misses     #    2.84% of all L1-dcache accesses  (46.83%)
      653842996501      dTLB-loads                #  446.938 M/sec                    (46.63%)
        3227179675      dTLB-load-misses          #    0.49% of all dTLB cache accesses  (46.85%)
      537531951350      iTLB-loads                #  367.433 M/sec                    (54.33%)
         114750630      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.37%)
      630135543177      L1-icache-loads           #  430.733 M/sec                    (46.80%)
       22923237620      L1-icache-load-misses     #    3.64% of all L1-icache accesses  (46.76%)
 
      91.964452802 seconds time elapsed

      43.416742000 seconds user
    1422.441123000 seconds sys
	
With prefetchw:

Time: 90.220

 Performance counter stats for 'hackbench -g 100 -l 10000':
        1437418.48 msec cpu-clock                 #   15.880 CPUs utilized          
          17694068      context-switches          #   12.310 K/sec                  
            958257      cpu-migrations            #  666.651 /sec                   
            100604      page-faults               #   69.989 /sec                   
     1583259429428      cycles                    #    1.101 GHz                      (54.57%)
     2004002484935      instructions              #    1.27  insn per cycle           (54.37%)
        5594202389      branch-misses                                                 (54.36%)
      643113574524      cache-references          #  447.409 M/sec                    (54.39%)
       18233791870      cache-misses              #    2.835 % of all cache refs      (54.37%)
      640205852062      L1-dcache-loads           #  445.386 M/sec                    (46.75%)
       17968160377      L1-dcache-load-misses     #    2.81% of all L1-dcache accesses  (46.79%)
      651747432274      dTLB-loads                #  453.415 M/sec                    (46.59%)
        3127124271      dTLB-load-misses          #    0.48% of all dTLB cache accesses  (46.75%)
      535395273064      iTLB-loads                #  372.470 M/sec                    (54.38%)
         113500056      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.35%)
      628871845924      L1-icache-loads           #  437.501 M/sec                    (46.80%)
       22585641203      L1-icache-load-misses     #    3.59% of all L1-icache accesses  (46.79%)
 
      90.514819303 seconds time elapsed
 
      43.877656000 seconds user
    1397.176001000 seconds sys

Thanks,
Hyeonggon




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux