On Mon, Oct 18, 2021 at 8:14 PM Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> wrote: > > 在 2021/10/19 上午9:57, Alexei Starovoitov 写道: > > On Sun, Oct 17, 2021 at 10:49 PM Chengming Zhou > > <zhouchengming@xxxxxxxxxxxxx> wrote: > >> > >> 在 2021/10/16 上午3:58, Alexei Starovoitov 写道: > >>> On Fri, Oct 15, 2021 at 11:04 AM Chengming Zhou > >>> <zhouchengming@xxxxxxxxxxxxx> wrote: > >>>> > >>>> We only use count for kmalloc hashtab not for prealloc hashtab, because > >>>> __pcpu_freelist_pop() return NULL when no more elem in pcpu freelist. > >>>> > >>>> But the problem is that __pcpu_freelist_pop() will traverse all CPUs and > >>>> spin_lock for all CPUs to find there is no more elem at last. > >>>> > >>>> We encountered bad case on big system with 96 CPUs that alloc_htab_elem() > >>>> would last for 1ms. This patch use count for prealloc hashtab too, > >>>> avoid traverse and spin_lock for all CPUs in this case. > >>>> > >>>> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> > >>> > >>> It's not clear from the commit log what you're solving. > >>> The atomic inc/dec in critical path of prealloc maps hurts performance. > >>> That's why it's not used. > >>> > >> Thanks for the explanation, what I'm solving is when hash table hasn't free > >> elements, we don't need to call __pcpu_freelist_pop() to traverse and > >> spin_lock all CPUs. The ftrace output of this bad case is below: > >> > >> 50) | htab_map_update_elem() { > >> 50) 0.329 us | _raw_spin_lock_irqsave(); > >> 50) 0.063 us | lookup_elem_raw(); > >> 50) | alloc_htab_elem() { > >> 50) | pcpu_freelist_pop() { > >> 50) 0.209 us | _raw_spin_lock(); > >> 50) 0.264 us | _raw_spin_lock(); > > > > This is LRU map. Not hash map. > > It will grab spin_locks of other cpus > > only if all previous cpus don't have free elements. > > Most likely your map is actually full and doesn't have any free elems. > > Since it's an lru it will force free an elem eventually. > > > > Maybe I missed something, the map_update_elem function of LRU map is > htab_lru_map_update_elem() and the htab_map_update_elem() above is the > map_update_elem function of hash map. > Because of the implementation of percpu freelist used in hash map, it > will spin_lock all other CPUs when there is no free elements. Ahh. Right. Then what's the point of optimizing the error case at the expense of the fast path?