On Fri, 25 Sep 2020 00:03:14 +0200 Daniel Borkmann wrote: > static inline u64 gen_cookie_next(struct gen_cookie *gc) > { > u64 val; > > if (likely(this_cpu_inc_return(*gc->level_nesting) == 1)) { Is this_cpu_inc() in itself atomic? Is there a comparison of performance of various atomic ops and locking somewhere? I wonder how this scheme would compare to a using a cmpxchg. > u64 *local_last = this_cpu_ptr(gc->local_last); > > val = *local_last; > if (__is_defined(CONFIG_SMP) && > unlikely((val & (COOKIE_LOCAL_BATCH - 1)) == 0)) { Can we reasonably assume we won't have more than 4k CPUs and just statically divide this space by encoding CPU id in top bits? > s64 next = atomic64_add_return(COOKIE_LOCAL_BATCH, > &gc->shared_last); > val = next - COOKIE_LOCAL_BATCH; > } > val++; > if (unlikely(!val)) > val++; > *local_last = val; > } else { > val = atomic64_add_return(COOKIE_LOCAL_BATCH, > &gc->shared_last); > } > this_cpu_dec(*gc->level_nesting); > return val; > }