On Sat, Jun 08, 2019 at 01:08:52PM +0200, Heiko Carstens wrote: > --- a/arch/s390/kernel/processor.c > +++ b/arch/s390/kernel/processor.c > @@ -31,6 +31,7 @@ struct cpu_info { > }; > > static DEFINE_PER_CPU(struct cpu_info, cpu_info); > +static DEFINE_PER_CPU(int, cpu_relax_retry); > > static bool machine_has_cpu_mhz; > > @@ -58,13 +59,21 @@ void s390_update_cpu_mhz(void) > on_each_cpu(update_cpu_mhz, NULL, 0); > } > > +void notrace cpu_relax_yield(const struct cpumask *cpumask) > { > + int cpu; > + > + if (__this_cpu_inc_return(cpu_relax_retry) >= spin_retry) { > + __this_cpu_write(cpu_relax_retry, 0); I don't mind, but do we really need a per-cpu variable for this? Does it really matter if you spin on a stack variable and occasionally spin a bit longer due to the missed tail of the previous spin? > + cpu = cpumask_next(smp_processor_id(), cpumask); > + if (cpu >= nr_cpu_ids) { > + cpu = cpumask_first(cpumask); > + if (cpu == smp_processor_id()) > + return; If this function is passed an empty cpumask, the above will result in 'cpu == nr_cpu_ids' and the below might be unhappy with that. (FWIW we do have cpumask_next_wrap(), but I admit it is somewhat awkward to use) > + } > + if (arch_vcpu_is_preempted(cpu)) > + smp_yield_cpu(cpu); > } > } > EXPORT_SYMBOL(cpu_relax_yield);