Hello, On Thu, Sep 20, 2018 at 06:18:21PM +0800, Jianchao Wang wrote: > -static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr) > +static inline void __percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr) > { > unsigned long __percpu *percpu_count; > > - rcu_read_lock_sched(); So, if we're gonna do this (please read below tho), please add the matching assertion > if (__ref_is_percpu(ref, &percpu_count)) > this_cpu_add(*percpu_count, nr); > else > atomic_long_add(nr, &ref->count); > +} > > +/** > + * percpu_ref_get_many - increment a percpu refcount > + * @ref: percpu_ref to get > + * @nr: number of references to get > + * > + * Analogous to atomic_long_add(). > + * > + * This function is safe to call as long as @ref is between init and exit. > + */ > +static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr) > +{ > + rcu_read_lock_sched(); > + __percpu_ref_get_many(ref, nr); > rcu_read_unlock_sched(); > } And add the matching variant for get/put with and without _many. Ming, so, if we make locking explicit like above, I think it should be fine to share the locking. However, please note that percpu_ref and blk_mq are using different types of RCU, at least for now, and I'm not really sure that unifying that and taking out one rcu read lock/unlock is a meaningful optimization. Let's please first do something straight-forward. If somebody can show that this actually impacts performance, we can optimize it but right now all these seem premature to me. Thanks. -- tejun