On 2/28/2024 9:32 AM, Joel Fernandes wrote: > > > On 2/20/2024 1:31 PM, Uladzislau Rezki (Sony) wrote: [...] >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >> index c8980d76f402..1328da63c3cd 100644 >> --- a/kernel/rcu/tree.c >> +++ b/kernel/rcu/tree.c >> @@ -75,6 +75,7 @@ >> #define MODULE_PARAM_PREFIX "rcutree." >> >> /* Data structures. */ >> +static void rcu_sr_normal_gp_cleanup_work(struct work_struct *); >> >> static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = { >> .gpwrap = true, >> @@ -93,6 +94,8 @@ static struct rcu_state rcu_state = { >> .exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex), >> .exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex), >> .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, >> + .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, >> + rcu_sr_normal_gp_cleanup_work), >> }; >> >> /* Dump rcu_node combining tree at boot to verify correct setup. */ >> @@ -1422,6 +1425,282 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap) >> raw_spin_unlock_irqrestore_rcu_node(rnp, flags); >> } > [..] >> +static void rcu_sr_normal_add_req(struct rcu_synchronize *rs) >> +{ >> + llist_add((struct llist_node *) &rs->head, &rcu_state.srs_next); >> +} >> + > > I'm a bit concerned from a memory order PoV about this llist_add() happening > possibly on a different CPU than the GP thread, and different than the kworker > thread. Basically we can have 3 CPUs simultaneously modifying and reading the > list, but only 2 CPUs have the acq-rel pair AFAICS. > > Consider the following situation: > > synchronize_rcu() user > ---------------------- > llist_add the user U - update srs_next list > > rcu_gp_init() and rcu_gp_cleanup (SAME THREAD) > -------------------- > insert dummy node in front of U, call it S > update wait_tail to U > > and then cleanup: > read wait_tail to W > set wait_tail to NULL > set done_tail to W (RELEASE) -- this release ensures U and S are seen by worker. > > workqueue handler > ----------------- > read done_tail (ACQUIRE) > disconnect rest of list -- disconnected list guaranteed to have U and S, > if done_tail read was W. > --------------------------------- > > So llist_add() does this (assume new_first and new_last are same): > > struct llist_node *first = READ_ONCE(head->first); > > do { > new_last->next = first; > } while (!try_cmpxchg(&head->first, &first, new_first)); > > return !first; > --- > > It reads head->first, then writes the new_last->next (call it new_first->next) > to the old first, then sets head->first to the new_first if head->first did not > change in the meanwhile. > > The problem I guess happens if the update the head->first is seen *after* the > update to the new_first->next. > > This potentially means a corrupted list is seen in the workqueue handler.. > because the "U" node is not yet seen pointing to the rest of the list > (previously added nodes), but is already seen the head of the list. > > I am not sure if this can happen, but AFAIK try_cmpxchg() doesn't imply ordering > per-se. Maybe that try_cmpxchg() should be a try_cmpxchg_release() in llist_add() ? Everyone in the internal RCU crew corrected me offline that try_cmpxchg() has full ordering if the cmpxchg succeeded. So I don't think the issue I mentioned can occur, So we can park this. Thanks! - Joel