Hi Christoph, On Sat, Aug 24, 2024 at 9:57 AM Christoph Lameter (Ampere) <cl@xxxxxxxxxx> wrote: > > On Sat, 24 Aug 2024, Yunhui Cui wrote: > > > Compared to directly fetching the per-CPU offset from memory (or cache), > > using the global pointer (gp) to store the per-CPU offset can save one > > memory access. > > Yes! That is a step in the right direction. > > Is there something like gp relative addressing so that we can do loads > and stores relative to gp as well? > > Are there atomics that can do read modify write relative to GP? That would > get you to comparable per cpu efficiency to x86. x86 can do relative > addressing and RMV in one instruction which allows one to drop the preempt > enable/disable since one instruction cannot be interrupted. Your suggestion is excellent. If conditions permit, we can indeed move closer to the x86 architecture. Thanks, Yunhui