Hello Vlastimil. On Mon, Oct 11, 2021 at 09:21:01AM +0200, Vlastimil Babka wrote: > On 10/11/21 00:49, David Rientjes wrote: > > On Fri, 8 Oct 2021, Hyeonggon Yoo wrote: > > > >> It's certain that an object will be not only read, but also > >> written after allocation. > >> > > > > Why is it certain? I think perhaps what you meant to say is that if we > > are doing any prefetching here, then access will benefit from prefetchw > > instead of prefetch. But it's not "certain" that allocated memory will be > > accessed at all. > > I think the primary reason there's a prefetch is freelist traversal. The > cacheline we prefetch will be read during the next allocation, so if we > expect there to be one soon, prefetch might help. I agree that. > That the freepointer is > part of object itself and thus the cache line will be probably accessed also > after the allocation, is secondary. Right. it depends on cache line size and whether first cache line of an object is frequently accessed or not. > Yeah this might help some workloads, but > perhaps hurt others - these things might look obvious in theory but be > rather unpredictable in practice. At least some hackbench results would help... > Below is my measurement. it seems prefetch(w) is not making things worse at least on hackbench. Measured on 16 CPUs (ARM64) / 16G RAM Without prefetch: Time: 91.989 Performance counter stats for 'hackbench -g 100 -l 10000': 1467926.03 msec cpu-clock # 15.907 CPUs utilized 17782076 context-switches # 12.114 K/sec 957523 cpu-migrations # 652.296 /sec 104561 page-faults # 71.230 /sec 1622117569931 cycles # 1.105 GHz (54.54%) 2002981132267 instructions # 1.23 insn per cycle (54.32%) 5600876429 branch-misses (54.28%) 642657442307 cache-references # 437.800 M/sec (54.27%) 19404890844 cache-misses # 3.019 % of all cache refs (54.28%) 640413686039 L1-dcache-loads # 436.271 M/sec (46.85%) 19110650580 L1-dcache-load-misses # 2.98% of all L1-dcache accesses (46.83%) 651556334841 dTLB-loads # 443.862 M/sec (46.63%) 3193647402 dTLB-load-misses # 0.49% of all dTLB cache accesses (46.84%) 538927659684 iTLB-loads # 367.135 M/sec (54.31%) 118503839 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.35%) 625750168840 L1-icache-loads # 426.282 M/sec (46.80%) 24348083282 L1-icache-load-misses # 3.89% of all L1-icache accesses (46.78%) 92.284351157 seconds time elapsed 44.524693000 seconds user 1426.214006000 seconds sys With prefetch: Time: 91.677 Performance counter stats for 'hackbench -g 100 -l 10000': 1462938.07 msec cpu-clock # 15.908 CPUs utilized 18072550 context-switches # 12.354 K/sec 1018814 cpu-migrations # 696.416 /sec 104558 page-faults # 71.471 /sec 2003670016013 instructions # 1.27 insn per cycle (54.31%) 5702204863 branch-misses (54.28%) 643368500985 cache-references # 439.778 M/sec (54.26%) 18475582235 cache-misses # 2.872 % of all cache refs (54.28%) 642206796636 L1-dcache-loads # 438.984 M/sec (46.87%) 18215813147 L1-dcache-load-misses # 2.84% of all L1-dcache accesses (46.83%) 653842996501 dTLB-loads # 446.938 M/sec (46.63%) 3227179675 dTLB-load-misses # 0.49% of all dTLB cache accesses (46.85%) 537531951350 iTLB-loads # 367.433 M/sec (54.33%) 114750630 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.37%) 630135543177 L1-icache-loads # 430.733 M/sec (46.80%) 22923237620 L1-icache-load-misses # 3.64% of all L1-icache accesses (46.76%) 91.964452802 seconds time elapsed 43.416742000 seconds user 1422.441123000 seconds sys With prefetchw: Time: 90.220 Performance counter stats for 'hackbench -g 100 -l 10000': 1437418.48 msec cpu-clock # 15.880 CPUs utilized 17694068 context-switches # 12.310 K/sec 958257 cpu-migrations # 666.651 /sec 100604 page-faults # 69.989 /sec 1583259429428 cycles # 1.101 GHz (54.57%) 2004002484935 instructions # 1.27 insn per cycle (54.37%) 5594202389 branch-misses (54.36%) 643113574524 cache-references # 447.409 M/sec (54.39%) 18233791870 cache-misses # 2.835 % of all cache refs (54.37%) 640205852062 L1-dcache-loads # 445.386 M/sec (46.75%) 17968160377 L1-dcache-load-misses # 2.81% of all L1-dcache accesses (46.79%) 651747432274 dTLB-loads # 453.415 M/sec (46.59%) 3127124271 dTLB-load-misses # 0.48% of all dTLB cache accesses (46.75%) 535395273064 iTLB-loads # 372.470 M/sec (54.38%) 113500056 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.35%) 628871845924 L1-icache-loads # 437.501 M/sec (46.80%) 22585641203 L1-icache-load-misses # 3.59% of all L1-icache accesses (46.79%) 90.514819303 seconds time elapsed 43.877656000 seconds user 1397.176001000 seconds sys Thanks, Hyeonggon