On Wed, Dec 14, 2022 at 08:07:28AM -0800, Dennis Zhou wrote: > Hello, > > On Wed, Dec 14, 2022 at 09:30:08PM +0800, Ming Lei wrote: > > On Wed, Dec 14, 2022 at 04:16:51PM +0800, Hillf Danton wrote: > > > On 14 Dec 2022 10:51:01 +0800 Ming Lei <ming.lei@xxxxxxxxxx> > > > > The pattern of wait_event(percpu_ref_is_zero()) has been used in several > > > > > > For example? > > > > blk_mq_freeze_queue_wait() and target_wait_for_sess_cmds(). > > > > > > > > > kernel components, and this way actually has the following risk: > > > > > > > > - percpu_ref_is_zero() can be returned just between > > > > atomic_long_sub_and_test() and ref->data->release(ref) > > > > > > > > - given the refcount is found as zero, percpu_ref_exit() could > > > > be called, and the host data structure is freed > > > > > > > > - then use-after-free is triggered in ->release() when the user host > > > > data structure is freed after percpu_ref_exit() returns > > > > > > The race between exit and the release callback should be considered at the > > > corresponding callsite, given the comment below, and closed for instance > > > by synchronizing rcu. > > > > > > /** > > > * percpu_ref_put_many - decrement a percpu refcount > > > * @ref: percpu_ref to put > > > * @nr: number of references to put > > > * > > > * Decrement the refcount, and if 0, call the release function (which was passed > > > * to percpu_ref_init()) > > > * > > > * This function is safe to call as long as @ref is between init and exit. > > > */ > > > > Not sure if the above comment implies that the callsite should cover the > > race. > > > > But blk-mq can really avoid the trouble by using the existed call_rcu(): > > > > I struggle with the dependency on release(). release() itself should not > block, but a common pattern would be to through a call_rcu() in and Yes, release() is called with rcu read lock, and I guess the trouble may be originated from the fact release() may do nothing related with actual data releasing. > schedule additional work - see block/blk-cgroup.c, blkg_release(). I believe the pattern is user specific, and the motivation of using call_rcu can't be just for avoiding such potential race between release() and percpu_ref_exit(). > > I think the dependency really is the completion of release() and the > work scheduled on it's behalf rather than strictly starting the > release() callback. This series doesn't preclude that from happening. Yeah. For any additional work or sort of thing scheduled in release(), only the caller can guarantee they are drained before percpu_exit_ref(), so I agree now it is better for caller to avoid the race. > > /** > * percpu_ref_exit - undo percpu_ref_init() > * @ref: percpu_ref to exit > * > * This function exits @ref. The caller is responsible for ensuring that > * @ref is no longer in active use. The usual places to invoke this > * function from are the @ref->release() callback or in init failure path > * where percpu_ref_init() succeeded but other parts of the initialization > * of the embedding object failed. > */ > > I think the percpu_ref_exit() comment explains the more common use case > approach to percpu refcounts. release() triggering percpu_ref_exit() is > the ideal case. But most of callers don't use in this way actually. Thanks, Ming