From: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> Please consider this is an RFC for discussion only. Just want to discuss why the GP_REPLAY state is needed at all. Here's the intention AFAICS: When rcu_sync_exit() has happened, the gp_state changes to GP_EXIT while we wait for a grace period before transitioning to GP_IDLE. In the meanwhile, if we receive another rcu_sync_exit(), then we want to wait for another GP to account for that. Drawing some timing diagrams, it looks like this: Legend: rse = rcu_sync_enter rsx = rcu_sync_exit i = GP_IDLE x = GP_EXIT r = GP_REPLAY e = GP_ENTER p = GP_PASSED rx = GP_REPLAY changes to GP_EXIT GP num = The GP we are one. note: A GP passes between the states: e and p x and i x and rx rx and i In a simple case, the timing and states look like: time ----------------------> GP num 1111111 2222222 GP state i e p x i CPU0 : rse rsx However we can enter the replay state like this: time ----------------------> GP num 1111111 2222222222222222222223333333 GP state i e p x r rx i CPU0 : rse rsx CPU1 : rse rsx Due to the second rse + rsx, we had to wait for another GP. I believe the rationale is, if another rsx happens, another GP has to happen. But this is not always true if you consider the following events: time ----------------------> GP num 111111 22222222222222222222222222222222233333333 GP state i e p x r rx i CPU0 : rse rsx CPU1 : rse rsx CPU2 : rse rsx Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(), and 2 for the rcu_sync_exit(s). However, we had 3 rcu_sync_exit()s, not 2. In other words, the rcu_sync_exit() got batched. So my point here is, rcu_sync_exit() does not really always cause a new GP to happen and we can end up having the rcu_sync_exit()s as batched and sharing the same grace period. Then what is the point of the GP_REPLAY state at all if it does not always wait for a new GP? Taking a step back, why did we intend to have to wait for a new GP if another rcu_sync_exit() comes while one is still in progress? Cc: bristot@xxxxxxxxxx Cc: peterz@xxxxxxxxxxxxx Cc: oleg@xxxxxxxxxx Cc: paulmck@xxxxxxxxxx Cc: rcu@xxxxxxxxxxxxxxx Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> --- kernel/rcu/sync.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c index d4558ab7a07d..4f3aad67992c 100644 --- a/kernel/rcu/sync.c +++ b/kernel/rcu/sync.c @@ -10,7 +10,7 @@ #include <linux/rcu_sync.h> #include <linux/sched.h> -enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY }; +enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT }; #define rss_lock gp_wait.lock @@ -85,13 +85,6 @@ static void rcu_sync_func(struct rcu_head *rhp) */ WRITE_ONCE(rsp->gp_state, GP_PASSED); wake_up_locked(&rsp->gp_wait); - } else if (rsp->gp_state == GP_REPLAY) { - /* - * A new rcu_sync_exit() has happened; requeue the callback to - * catch a later GP. - */ - WRITE_ONCE(rsp->gp_state, GP_EXIT); - rcu_sync_call(rsp); } else { /* * We're at least a GP after the last rcu_sync_exit(); eveybody @@ -167,16 +160,13 @@ void rcu_sync_enter(struct rcu_sync *rsp) */ void rcu_sync_exit(struct rcu_sync *rsp) { - WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE); - WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0); + WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED); spin_lock_irq(&rsp->rss_lock); if (!--rsp->gp_count) { if (rsp->gp_state == GP_PASSED) { WRITE_ONCE(rsp->gp_state, GP_EXIT); rcu_sync_call(rsp); - } else if (rsp->gp_state == GP_EXIT) { - WRITE_ONCE(rsp->gp_state, GP_REPLAY); } } spin_unlock_irq(&rsp->rss_lock); -- 2.23.0.581.g78d2f28ef7-goog