Re: [PATCH 3.13, 3.14, 3.16, 3.17] rcu: Reduce rcu kthread wakeups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2014-10-13 at 12:29 -0400, Pranith Kumar wrote:
> Backport 2aa792e and parts of 48a7639 to 3.13 which is Ubuntu LTS kernel.
> 
> This commit reduces the number of rcu kthread wakeups. Tested for over a week on
> my desktop 14.04 system.
> 
> Signed-off-by: Pranith Kumar <bobby.prani@xxxxxxxxx>
> CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>


Hi Pranith and Paul-

This looks good, but how about lets split it back into two commits (the
second a clean cherry-pick) and submit it for 3.{14,16,17}-stable also:

3.13 and 3.14 could apply the attached 48a7639 mini-backport (just
supplying rcu_gp_kthread_wake), and then cherry-pick 2aa792e.

3.16 and 3.17 carry 48a7639 already, so could just cherry-pick 2aa792e.

 -Kamal


>  kernel/rcu/tree.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index dd08198..9b3beea 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1193,6 +1193,22 @@ static int rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
>  }
>  
>  /*
> + * Awaken the grace-period kthread for the specified flavor of RCU.
> + * Don't do a self-awaken, and don't bother awakening when there is
> + * nothing for the grace-period kthread to do (as in several CPUs
> + * raced to awaken, and we lost), and finally don't try to awaken
> + * a kthread that has not yet been created.
> + */
> +static void rcu_gp_kthread_wake(struct rcu_state *rsp)
> +{
> +	if (current == rsp->gp_kthread ||
> +	    !ACCESS_ONCE(rsp->gp_flags) ||
> +	    !rsp->gp_kthread)
> +		return;
> +	wake_up(&rsp->gp_wq);
> +}
> +
> +/*
>   * If there is room, assign a ->completed number to any callbacks on
>   * this CPU that have not already been assigned.  Also accelerate any
>   * callbacks that were previously assigned a ->completed number that has
> @@ -1625,7 +1641,7 @@ static void rsp_wakeup(struct irq_work *work)
>  	struct rcu_state *rsp = container_of(work, struct rcu_state, wakeup_work);
>  
>  	/* Wake up rcu_gp_kthread() to start the grace period. */
> -	wake_up(&rsp->gp_wq);
> +	rcu_gp_kthread_wake(rsp);
>  }
>  
>  /*
> @@ -1701,7 +1717,7 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags)
>  {
>  	WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
>  	raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags);
> -	wake_up(&rsp->gp_wq);  /* Memory barrier implied by wake_up() path. */
> +	rcu_gp_kthread_wake(rsp);
>  }
>  
>  /*
> @@ -2271,7 +2287,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
>  	}
>  	rsp->gp_flags |= RCU_GP_FLAG_FQS;
>  	raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
> -	wake_up(&rsp->gp_wq);  /* Memory barrier implied by wake_up() path. */
> +	rcu_gp_kthread_wake(rsp);
>  }
>  
>  /*

>From b433a3686a2031d9de32d49249519a851688b54a Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
Date: Tue, 11 Mar 2014 13:02:16 -0700
Subject: rcu: Make callers awaken grace-period kthread

commit 48a7639ce80cf279834d0d44865e49ecd714f37d upstream.

The rcu_start_gp_advanced() function currently uses irq_work_queue()
to defer wakeups of the RCU grace-period kthread.  This deferring
is necessary to avoid RCU-scheduler deadlocks involving the rcu_node
structure's lock, meaning that RCU cannot call any of the scheduler's
wake-up functions while holding one of these locks.

Unfortunately, the second and subsequent calls to irq_work_queue() are
ignored, and the first call will be ignored (aside from queuing the work
item) if the scheduler-clock tick is turned off.  This is OK for many
uses, especially those where irq_work_queue() is called from an interrupt
or softirq handler, because in those cases the scheduler-clock-tick state
will be re-evaluated, which will turn the scheduler-clock tick back on.
On the next tick, any deferred work will then be processed.

However, this strategy does not always work for RCU, which can be invoked
at process level from idle CPUs.  In this case, the tick might never
be turned back on, indefinitely defering a grace-period start request.
Note that the RCU CPU stall detector cannot see this condition, because
there is no RCU grace period in progress.  Therefore, we can (and do!)
see long tens-of-seconds stalls in grace-period handling.  In theory,
we could see a full grace-period hang, but rcutorture testing to date
has seen only the tens-of-seconds stalls.  Event tracing demonstrates
that irq_work_queue() is being called repeatedly to no effect during
these stalls: The "newreq" event appears repeatedly from a task that is
not one of the grace-period kthreads.

In theory, irq_work_queue() might be fixed to avoid this sort of issue,
but RCU's requirements are unusual and it is quite straightforward to pass
wake-up responsibility up through RCU's call chain, so that the wakeup
happens when the offending locks are released.

This commit therefore makes this change.  The rcu_start_gp_advanced(),
rcu_start_future_gp(), rcu_accelerate_cbs(), rcu_advance_cbs(),
__note_gp_changes(), and rcu_start_gp() functions now return a boolean
which indicates when a wake-up is needed.  A new rcu_gp_kthread_wake()
does the wakeup when it is necessary and safe to do so: No self-wakes,
no wake-ups if the ->gp_flags field indicates there is no need (as in
someone else did the wake-up before we got around to it), and no wake-ups
before the grace-period kthread has been created.

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
Reviewed-by: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
[ Pranith: backport to 3.13-stable: just rcu_gp_kthread_wake(),
  prereq for 2aa792e "rcu: Use rcu_gp_kthread_wake() to wake up grace
  period kthreads" ]
Signed-off-by: Pranith Kumar <bobby.prani@xxxxxxxxx>
Signed-off-by: Kamal Mostafa <kamal@xxxxxxxxxxxxx>
---
 kernel/rcu/tree.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index dd08198..e559421 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1193,6 +1193,22 @@ static int rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
 }
 
 /*
+ * Awaken the grace-period kthread for the specified flavor of RCU.
+ * Don't do a self-awaken, and don't bother awakening when there is
+ * nothing for the grace-period kthread to do (as in several CPUs
+ * raced to awaken, and we lost), and finally don't try to awaken
+ * a kthread that has not yet been created.
+ */
+static void rcu_gp_kthread_wake(struct rcu_state *rsp)
+{
+	if (current == rsp->gp_kthread ||
+	    !ACCESS_ONCE(rsp->gp_flags) ||
+	    !rsp->gp_kthread)
+		return;
+	wake_up(&rsp->gp_wq);
+}
+
+/*
  * If there is room, assign a ->completed number to any callbacks on
  * this CPU that have not already been assigned.  Also accelerate any
  * callbacks that were previously assigned a ->completed number that has
@@ -1625,7 +1641,7 @@ static void rsp_wakeup(struct irq_work *work)
 	struct rcu_state *rsp = container_of(work, struct rcu_state, wakeup_work);
 
 	/* Wake up rcu_gp_kthread() to start the grace period. */
-	wake_up(&rsp->gp_wq);
+	rcu_gp_kthread_wake(rsp);
 }
 
 /*
-- 
1.9.1


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]