On Fri, 20 Dec 2019, Tejun Heo wrote: > On Fri, Dec 20, 2019 at 10:34:20AM +0100, Jesper Dangaard Brouer wrote: > > > So, my question to the uarch/percpu folks out there: Why are percpu > > > accesses (%gs segment register) more expensive than regular global > > > variables in this scenario. > > > > I'm also VERY interested in knowing the answer to above question!? > > (Adding LKML to reach more people) > > No idea. One difference is that percpu accesses are through vmap area > which is mapped using 4k pages while global variable would be accessed > through the fault linear mapping. Maybe you're getting hit by tlb > pressure? And there are some accesses from remote processors to per cpu ares of other cpus. If those are in the same cacheline then those will cause additional latencies.