On 12/20/19 7:12 AM, Tejun Heo wrote: > On Fri, Dec 20, 2019 at 10:34:20AM +0100, Jesper Dangaard Brouer wrote: >>> So, my question to the uarch/percpu folks out there: Why are percpu >>> accesses (%gs segment register) more expensive than regular global >>> variables in this scenario. >> >> I'm also VERY interested in knowing the answer to above question!? >> (Adding LKML to reach more people) > > No idea. One difference is that percpu accesses are through vmap area > which is mapped using 4k pages while global variable would be accessed > through the fault linear mapping. Maybe you're getting hit by tlb > pressure? I definitely seen expensive per-cpu updates in the stack. (SNMP counters, or per-cpu stats for packets/bytes counters) It might be nice to have an option to use 2M pages. (I recall sending some patches in the past about using high-order pages for vmalloc, but this went nowhere)