Re: [PATCH v2 0/6] RCU tasks fixes for v6.9

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Fri, 23 Feb 2024 16:43:04 -0800

On Fri, Feb 23, 2024 at 01:25:06PM +0100, Frederic Weisbecker wrote:
> On Thu, Feb 22, 2024 at 02:09:17PM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 22, 2024 at 05:52:23PM +0100, Frederic Weisbecker wrote:
> > > Le Fri, Feb 16, 2024 at 05:27:35PM -0800, Boqun Feng a écrit :
> > > > Hi,
> > > > 
> > > > This series contains the fixes of RCU tasks for v6.9. You can also find
> > > > the series at:
> > > > 
> > > > 	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
> > > > 
> > > > Changes since v1:
> > > > 
> > > > *	Update with Paul's rework on "Eliminate deadlocks involving
> > > > 	do_exit() and RCU task"
> > > > 
> > > > The detailed list of changes:
> > > > 
> > > > Paul E. McKenney (6):
> > > >   rcu-tasks: Repair RCU Tasks Trace quiescence check
> > > >   rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
> > > >   rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
> > > >   rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
> > > >   rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
> > > 
> > > Food for later thoughts and further improvements: would it make sense to
> > > call exit_rcu_tasks_start() on fork() instead and rely solely on
> > > each CPUs' rtp_exit_list instead of the tasklist?
> > 
> > It might well.
> > 
> > One big advantage of doing that is the ability to incrementally traverse
> > the tasks.  But is there some good way of doing that to the full task
> > lists?  If so, everyone could benefit.
> 
> What do you mean by incrementally? You mean being able to cond_resched()
> in the middle of the tasks iteration? Yeah not sure that's possible...

I do indeed mean doing cond_resched() mid-stream.

One way to make this happen would be to do something like this:

struct task_struct_anchor {
	struct list_head tsa_list;
	struct list_head tsa_adjust_list;
	atomic_t tsa_ref;  // Or use an appropriate API.
	bool tsa_is_anchor;
}

Each task structure would contain one of these, though there are a
number of ways to conserve space if needed.

These anchors would be placed perhaps every 1,000 tasks or so.  When a
traversal encountered one, it could atomic_inc_not_zero() the reference
count, and if that succeeded, exit the RCU read-side critical section and
do a cond_resched().  It could then enter a new RCU read-side critical
section, drop the reference, and continue.

A traveral might container_of() its way from ->tsa_list to the
task_struct_anchor structure, then if ->tsa_is_anchor is false,
container_of() its way to the enclosing task structure.

How to maintain proper spacing of the anchors?

One way is to make the traversals do the checking.  If the space between a
pair of anchors was to large or too small, it could add the first of the
pair to a list to be adjusted.  This list could periodically be processed,
perhaps with more urgency if a huge gap had opened up.

Freeing an anchor requires decrementing the reference count, waiting for
it to go to zero, removing the anchor, waiting for a grace period (perhaps
asynchronously), and only then freeing the anchor.

Anchors cannot be moved, only added or removed.

So it is possible.  But is it reasonable?  ;-)

							Thanx, Paul

> > > Thanks.
> > > 
> > > >   rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
> > > > 
> > > >  include/linux/rcupdate.h |   4 +-
> > > >  include/linux/sched.h    |   2 +
> > > >  init/init_task.c         |   1 +
> > > >  kernel/fork.c            |   1 +
> > > >  kernel/rcu/tasks.h       | 110 ++++++++++++++++++++++++++++++---------
> > > >  5 files changed, 90 insertions(+), 28 deletions(-)
> > > > 
> > > > -- 
> > > > 2.43.0
> > > > 
> > > > 
> > >