On Wed, Dec 14, 2022 at 10:31:32PM -0500, Waiman Long wrote: > As noted by Michal, the blkg_iostat_set's in the lockless list > hold reference to blkg's to protect against their removal. Those > blkg's hold reference to blkcg. When a cgroup is being destroyed, > cgroup_rstat_flush() is only called at css_release_work_fn() which > is called when the blkcg reference count reaches 0. This circular > dependency will prevent blkcg and some blkgs from being freed after > they are made offline. > > It is less a problem if the cgroup to be destroyed also has other > controllers like memory that will call cgroup_rstat_flush() which will > clean up the reference count. If block is the only controller that uses > rstat, these offline blkcg and blkgs may never be freed leaking more > and more memory over time. > > To prevent this potential memory leak, a new cgroup_rstat_css_cpu_flush() > function is added to flush stats for a given css and cpu. This new > function will be called at blkgs destruction path, blkcg_destroy_blkgs(), > whenever there are still pending stats to be flushed. This will release > the references to blkgs allowing them to be freed and indirectly allow > the freeing of blkcg. > > Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > Acked-by: Tejun Heo <tj@xxxxxxxxxx> > --- > block/blk-cgroup.c | 16 ++++++++++++++++ > include/linux/cgroup.h | 1 + > kernel/cgroup/rstat.c | 18 ++++++++++++++++++ > 3 files changed, 35 insertions(+) > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index ca28306aa1b1..a2a1081d9d1d 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -1084,6 +1084,8 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css) > */ > static void blkcg_destroy_blkgs(struct blkcg *blkcg) > { > + int cpu; > + > /* > * blkcg_destroy_blkgs() shouldn't be called with all the blkcg > * references gone. > @@ -1093,6 +1095,20 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg) > > might_sleep(); > > + /* > + * Flush all the non-empty percpu lockless lists to release the > + * blkg references held by those lists which, in turn, will > + * allow those blkgs to be freed and release their references to > + * blkcg. Otherwise, they may not be freed at all becase of this > + * circular dependency resulting in memory leak. > + */ > + for_each_possible_cpu(cpu) { > + struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu); > + > + if (!llist_empty(lhead)) > + cgroup_rstat_css_cpu_flush(&blkcg->css, cpu); > + } I guess it is possible for new iostat_cpu to be added just after the llist_empty() check. Thanks, Ming