Re: [PATCH V2] blk-cgroup: Flush stats before releasing blkcg_gq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 24, 2023 at 12:19:57AM -0400, Waiman Long wrote:
> On 5/23/23 23:51, Ming Lei wrote:
> > As noted by Michal, the blkg_iostat_set's in the lockless list hold
> > reference to blkg's to protect against their removal. Those blkg's
> > hold reference to blkcg. When a cgroup is being destroyed,
> > cgroup_rstat_flush() is only called at css_release_work_fn() which
> > is called when the blkcg reference count reaches 0. This circular
> > dependency will prevent blkcg and some blkgs from being freed after
> > they are made offline.
> > 
> > It is less a problem if the cgroup to be destroyed also has other
> > controllers like memory that will call cgroup_rstat_flush() which will
> > clean up the reference count. If block is the only controller that uses
> > rstat, these offline blkcg and blkgs may never be freed leaking more
> > and more memory over time.
> > 
> > To prevent this potential memory leak:
> > 
> > - flush blkcg per-cpu stats list in __blkg_release(), when no new stat
> > can be added
> > 
> > - don't grab bio->bi_blkg reference when adding the stats into blkcg's
> > per-cpu stat list since all stats are guaranteed to be consumed before
> > releasing blkg instance, and grabbing blkg reference for stats was the
> > most fragile part of original patch
> > 
> > Based on Waiman's patch:
> > 
> > https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@xxxxxxxxxx/
> > 
> > Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
> > Cc: Waiman Long <longman@xxxxxxxxxx>
> > Cc: Tejun Heo <tj@xxxxxxxxxx>
> > Cc: mkoutny@xxxxxxxx
> > Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > ---
> > V2:
> > 	- remove kernel/cgroup change, and call blkcg_rstat_flush()
> > 	to flush stat directly
> > 
> >   block/blk-cgroup.c | 29 +++++++++++++++++++++--------
> >   1 file changed, 21 insertions(+), 8 deletions(-)
> > 
> > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > index 0ce64dd73cfe..ed0eb8896972 100644
> > --- a/block/blk-cgroup.c
> > +++ b/block/blk-cgroup.c
> > @@ -34,6 +34,8 @@
> >   #include "blk-ioprio.h"
> >   #include "blk-throttle.h"
> > +static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu);
> > +
> >   /*
> >    * blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
> >    * blkcg_pol_register_mutex nests outside of it and synchronizes entire
> > @@ -163,10 +165,21 @@ static void blkg_free(struct blkcg_gq *blkg)
> >   static void __blkg_release(struct rcu_head *rcu)
> >   {
> >   	struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
> > +	struct blkcg *blkcg = blkg->blkcg;
> > +	int cpu;
> >   #ifdef CONFIG_BLK_CGROUP_PUNT_BIO
> >   	WARN_ON(!bio_list_empty(&blkg->async_bios));
> >   #endif
> > +	/*
> > +	 * Flush all the non-empty percpu lockless lists before releasing
> > +	 * us, given these stat belongs to us.
> > +	 *
> > +	 * cgroup locks aren't needed here since __blkcg_rstat_flush just
> > +	 * propagates delta into blkg parent, which is live now.
> > +	 */
> > +	for_each_possible_cpu(cpu)
> > +		__blkcg_rstat_flush(blkcg, cpu);
> >   	/* release the blkcg and parent blkg refs this blkg has been holding */
> >   	css_put(&blkg->blkcg->css);
> > @@ -951,17 +964,12 @@ static void blkcg_iostat_update(struct blkcg_gq *blkg, struct blkg_iostat *cur,
> >   	u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
> >   }
> > -static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> > +static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
> >   {
> > -	struct blkcg *blkcg = css_to_blkcg(css);
> >   	struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
> >   	struct llist_node *lnode;
> >   	struct blkg_iostat_set *bisc, *next_bisc;
> > -	/* Root-level stats are sourced from system-wide IO stats */
> > -	if (!cgroup_parent(css->cgroup))
> > -		return;
> > -
> >   	rcu_read_lock();
> >   	lnode = llist_del_all(lhead);
> > @@ -991,13 +999,19 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> >   		if (parent && parent->parent)
> >   			blkcg_iostat_update(parent, &blkg->iostat.cur,
> >   					    &blkg->iostat.last);
> > -		percpu_ref_put(&blkg->refcnt);
> >   	}
> >   out:
> >   	rcu_read_unlock();
> >   }
> > +static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> > +{
> > +	/* Root-level stats are sourced from system-wide IO stats */
> > +	if (cgroup_parent(css->cgroup))
> > +		__blkcg_rstat_flush(css_to_blkcg(css), cpu);
> > +}
> > +
> 
> I think it may not safe to call __blkcg_rstat_flus() directly without taking
> the cgroup_rstat_cpu_lock. That is why I added a helper to
> kernel/cgroup/rstat.c in my patch to meet the locking requirement.

All stats are removed from llist_del_all(), and the local list is
iterated, then each blkg & its parent is touched in __blkcg_rstat_flus(), so
can you explain it a bit why cgroup locks are needed? For protecting
what?

Thanks,
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux