From: "Joel Fernandes" <joelagnelf@xxxxxxxxxx> The rcu_seq_done() API has a large "false-negative" windows of size ULONG_MAX/2, where after wrap around, it is possible that it will think that a GP has not completed if a wrap around happens and the delta is large. One place this might cause a possible problem is SRCU: poll_state_synchronize_srcu() uses rcu_seq_done() unlike poll_state_synchronize_rcu() which uses rcu_seq_done_exact(). The rcu_seq_done_exact() makes more sense for polling API, as there is a higher chance that there is a significant delay between the get_state..() and poll_state..() calls. Another place where this seems scary is if the condition for the wakeup was false causing missed wakeups, example in tree-nocb: swait_event_interruptible_exclusive( rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq) & 0x1], rcu_seq_done(&rnp->gp_seq, wait_gp_seq) || !READ_ONCE(my_rdp->nocb_gp_sleep)); The shorter false-negative window of rcu_seq_done_exact() would improve robustness as rcu_seq_done_exact() makes the window of false-negativity by only ~2-3 GPs versus ULONG_MAX/2. It also results in a negative code delta and could potentially avoid issues in the future where rcu_seq_done() was reporting false-negatives for too long. One downside of this change is the slightly higher computation, but it is trivial computation and I think is worth it. rcutorture runs of all scenarios for 15 minutes passed. Code inspection was done thoroughly for all users to convince the change would work. Further inspection reveals it is more robust so it is more than a cleanup. Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx> --- v1->v2: Updated commit message with more analysis and points. kernel/rcu/rcu.h | 13 ++----------- kernel/rcu/tree.c | 6 +++--- 2 files changed, 5 insertions(+), 14 deletions(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index eed2951a4962..c2ca196907cb 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) /* * Given a snapshot from rcu_seq_snap(), determine whether or not a - * full update-side operation has occurred. + * full update-side operation has occurred while also handling + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. */ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) -{ - return ULONG_CMP_GE(READ_ONCE(*sp), s); -} - -/* - * Given a snapshot from rcu_seq_snap(), determine whether or not a - * full update-side operation has occurred, but do not allow the - * (ULONG_MAX / 2) safety-factor/guard-band. - */ -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) { unsigned long cur_s = READ_ONCE(*sp); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b77ccc55557b..835600cec9ba 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); bool poll_state_synchronize_rcu(unsigned long oldstate) { if (oldstate == RCU_GET_STATE_COMPLETED || - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { smp_mb(); /* Ensure GP ends before subsequent accesses. */ return true; } @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) smp_mb(); // Order against root rcu_node structure grace-period cleanup. if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { smp_mb(); /* Ensure GP ends before subsequent accesses. */ return true; } -- 2.34.1