On Wed, May 15, 2024 at 10:30:59AM -0400, Waiman Long wrote: > During a cgroup_rstat_flush() call, the lowest level of nodes are flushed > first before their parents. Since commit 3b8cc6298724 ("blk-cgroup: > Optimize blkcg_rstat_flush()"), iostat propagation was still done to > the parent. Grandparent, however, may not get the iostat update if the > parent has no blkg_iostat_set queued in its lhead lockless list. > > Fix this iostat propagation problem by queuing the parent's global > blkg->iostat into one of its percpu lockless lists to make sure that > the delta will always be propagated up to the grandparent and so on > toward the root blkcg. > > Note that successive calls to __blkcg_rstat_flush() are serialized by > the cgroup_rstat_lock. So no special barrier is used in the reading > and writing of blkg->iostat.lqueued. > > Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") > Reported-by: Dan Schatzberg <schatzberg.dan@xxxxxxxxx> > Closes: https://lore.kernel.org/lkml/ZkO6l%2FODzadSgdhC@dschatzberg-fedora-PF3DHTBV/ > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > --- > block/blk-cgroup.c | 19 ++++++++++++++++++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index 059467086b13..2a7624c32a1a 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -323,6 +323,7 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct gendisk *disk, > blkg->q = disk->queue; > INIT_LIST_HEAD(&blkg->q_node); > blkg->blkcg = blkcg; > + blkg->iostat.blkg = blkg; > #ifdef CONFIG_BLK_CGROUP_PUNT_BIO > spin_lock_init(&blkg->async_bio_lock); > bio_list_init(&blkg->async_bios); > @@ -1025,6 +1026,8 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu) > unsigned int seq; > > WRITE_ONCE(bisc->lqueued, false); > + if (bisc == &blkg->iostat) > + goto propagate_up; /* propagate up to parent only */ > > /* fetch the current per-cpu values */ > do { > @@ -1034,10 +1037,24 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu) > > blkcg_iostat_update(blkg, &cur, &bisc->last); > > +propagate_up: > /* propagate global delta to parent (unless that's root) */ > - if (parent && parent->parent) > + if (parent && parent->parent) { > blkcg_iostat_update(parent, &blkg->iostat.cur, > &blkg->iostat.last); > + /* > + * Queue parent->iostat to its blkcg's lockless > + * list to propagate up to the grandparent if the > + * iostat hasn't been queued yet. > + */ > + if (!parent->iostat.lqueued) { > + struct llist_head *plhead; > + > + plhead = per_cpu_ptr(parent->blkcg->lhead, cpu); > + llist_add(&parent->iostat.lnode, plhead); > + parent->iostat.lqueued = true; > + } > + } > } > raw_spin_unlock_irqrestore(&blkg_stat_lock, flags); > out: > -- > 2.39.3 > I've tested and confirmed this patch fixes the original issue. Thanks!