> On Dec 6, 2018, at 2:25 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Thu, Dec 06, 2018 at 12:28:26AM -0800, Nadav Amit wrote: >> [ +Peter ] >> >> So I dug some more (I’m still not done), and found various trivial things >> (e.g., storing zero extending u32 immediate is shorter for registers, >> inlining already takes place). >> >> *But* there is one thing that may require some attention - patch >> b59167ac7bafd ("x86/percpu: Fix this_cpu_read()”) set ordering constraints >> on the VM_ARGS() evaluation. And this patch also imposes, it appears, >> (unnecessary) constraints on other pieces of code. >> >> These constraints are due to the addition of the volatile keyword for >> this_cpu_read() by the patch. This affects at least 68 functions in my >> kernel build, some of which are hot (I think), e.g., finish_task_switch(), >> smp_x86_platform_ipi() and select_idle_sibling(). >> >> Peter, perhaps the solution was too big of a hammer? Is it possible instead >> to create a separate "this_cpu_read_once()” with the volatile keyword? Such >> a function can be used for native_sched_clock() and other seqlocks, etc. > > No. like the commit writes this_cpu_read() _must_ imply READ_ONCE(). If > you want something else, use something else, there's plenty other > options available. > > There's this_cpu_op_stable(), but also __this_cpu_read() and > raw_this_cpu_read() (which currently don't differ from this_cpu_read() > but could). Would setting the inline assembly memory operand both as input and output be better than using the “volatile”? I think that If you do that, the compiler would should the this_cpu_read() as something that changes the per-cpu-variable, which would make it invalid to re-read the value. At the same time, it would not prevent reordering the read with other stuff.