On Mon, Mar 28, 2022 at 8:58 AM Waiman Long <longman@xxxxxxxxxx> wrote: > > Muchun Song found out there could be a race between list_lru_add() > and memcg_reparent_list_lru_node() causing the later function to miss > reparenting of a lru entry as shown below: > > CPU0: CPU1: > list_lru_add() > spin_lock(&nlru->lock) > l = list_lru_from_kmem(memcg) > memcg_reparent_objcgs(memcg) > memcg_reparent_list_lrus(memcg) > memcg_reparent_list_lru() > memcg_reparent_list_lru_node() > if (!READ_ONCE(nlru->nr_items)) > // Miss reparenting > return > // Assume 0->1 > l->nr_items++ > // Assume 0->1 > nlru->nr_items++ > > Though it is not likely that a list_lru_node that has 0 item suddenly > has a newly added lru entry at the end of its life. The race is still > theoretically possible. > > Adding a spin_is_locked() check will likely be enough for x86, but it > is less certain for other arches with a more relaxed memory semantics > like arcm64 and ppc. To avoid race, this patch moves the nr_items check > to within the lock critical section. > > Fixes: 405cc51fc104 ("mm/list_lru: optimize memcg_reparent_list_lru_node()") > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> How about the following patch? It is low overhead on x86_64. Even on relaxed memory mode, I think it is also lower overhead since it avoid a store operation to nlru->lock. We do not need to insert a smp_wmb() into the list_lru_add() since spin_lock() always implies at least a load acquiring semantics. Thanks. diff --git a/mm/list_lru.c b/mm/list_lru.c index c669d87001a6..0e58374b629b 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -397,8 +397,11 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid, /* * If there is no lru entry in this nlru, we can skip it immediately. */ - if (!READ_ONCE(nlru->nr_items)) - return; + if (!READ_ONCE(nlru->nr_items)) { + smp_rmb(); + if (!spin_is_locked(&nlru->lock)) + return; + } /* * Since list_lru_{add,del} may be called under an IRQ-safe lock,