Re: [PATCH 3/3] lib/percpu-refcount: drain ->release() in perpcu_ref_exit()

Dennis Zhou <dennis@xxxxxxxxxx> · Wed, 14 Dec 2022 08:07:28 -0800

Hello,

On Wed, Dec 14, 2022 at 09:30:08PM +0800, Ming Lei wrote:
> On Wed, Dec 14, 2022 at 04:16:51PM +0800, Hillf Danton wrote:
> > On 14 Dec 2022 10:51:01 +0800 Ming Lei <ming.lei@xxxxxxxxxx>
> > > The pattern of wait_event(percpu_ref_is_zero()) has been used in several
> > 
> > For example?
> 
> blk_mq_freeze_queue_wait() and target_wait_for_sess_cmds().
> 
> > 
> > > kernel components, and this way actually has the following risk:
> > > 
> > > - percpu_ref_is_zero() can be returned just between
> > >   atomic_long_sub_and_test() and ref->data->release(ref)
> > > 
> > > - given the refcount is found as zero, percpu_ref_exit() could
> > >   be called, and the host data structure is freed
> > > 
> > > - then use-after-free is triggered in ->release() when the user host
> > >   data structure is freed after percpu_ref_exit() returns
> > 
> > The race between exit and the release callback should be considered at the
> > corresponding callsite, given the comment below, and closed for instance
> > by synchronizing rcu.
> > 
> > /**
> >  * percpu_ref_put_many - decrement a percpu refcount
> >  * @ref: percpu_ref to put
> >  * @nr: number of references to put
> >  *
> >  * Decrement the refcount, and if 0, call the release function (which was passed
> >  * to percpu_ref_init())
> >  *
> >  * This function is safe to call as long as @ref is between init and exit.
> >  */
> 
> Not sure if the above comment implies that the callsite should cover the
> race.
> 
> But blk-mq can really avoid the trouble by using the existed call_rcu():
> 

I struggle with the dependency on release(). release() itself should not
block, but a common pattern would be to through a call_rcu() in and
schedule additional work - see block/blk-cgroup.c, blkg_release().

I think the dependency really is the completion of release() and the
work scheduled on it's behalf rather than strictly starting the
release() callback. This series doesn't preclude that from happening.

/**
 * percpu_ref_exit - undo percpu_ref_init()
 * @ref: percpu_ref to exit
 *
 * This function exits @ref.  The caller is responsible for ensuring that
 * @ref is no longer in active use.  The usual places to invoke this
 * function from are the @ref->release() callback or in init failure path
 * where percpu_ref_init() succeeded but other parts of the initialization
 * of the embedding object failed.
 */

I think the percpu_ref_exit() comment explains the more common use case
approach to percpu refcounts. release() triggering percpu_ref_exit() is
the ideal case.

Thanks,
Dennis

> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 3866b6c4cd88..9321767470dc 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -254,14 +254,15 @@ EXPORT_SYMBOL_GPL(blk_clear_pm_only);
>  
>  static void blk_free_queue_rcu(struct rcu_head *rcu_head)
>  {
> -	kmem_cache_free(blk_requestq_cachep,
> -			container_of(rcu_head, struct request_queue, rcu_head));
> +	struct request_queue *q = container_of(rcu_head,
> +			struct request_queue, rcu_head);
> +
> +	percpu_ref_exit(&q->q_usage_counter);
> +	kmem_cache_free(blk_requestq_cachep, q);
>  }
>  
>  static void blk_free_queue(struct request_queue *q)
>  {
> -	percpu_ref_exit(&q->q_usage_counter);
> -
>  	if (q->poll_stat)
>  		blk_stat_remove_callback(q, q->poll_cb);
>  	blk_stat_free_callback(q->poll_cb);
> 
> 
> Thanks, 
> Ming
>