Re: [PATCH V2] blk-cgroup: Flush stats before releasing blkcg_gq

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Wed, 24 May 2023 20:23:28 -0700

On Wed, May 24, 2023 at 8:11 PM Waiman Long <longman@xxxxxxxxxx> wrote:
>
> On 5/24/23 23:04, Waiman Long wrote:
> >> Hi Waiman,
> >>
> >> I don't have context about blkcg, but isn't this exactly what
> >> cgroup_rstat_lock does? Is it too expensive to just call
> >> cgroup_rstat_flush () here?
> >
> > I have thought about that too. However, in my test, just calling
> > cgroup_rstat_flush() in blkcg_destroy_blkgs() did not prevent dying
> > blkcgs from increasing meaning that there are still some extra
> > references blocking its removal. I haven't figured out exactly why
> > that is the case. There may still be some races that we have not fully
> > understood yet. On the other hand, Ming's patch is verified to not do
> > that since it does not take extra blkg references. So I am leaning on
> > his patch now. I just have to make sure that there is no concurrent
> > rstat update.
>
> It is also true that calling cgroup_rstat_flush() is much more
> expensive, especially at the blkg level where there can be many blkgs to
> be destroyed when a blkcg is destroyed.

I see. For what's it worth, consequent flushes on the same cgroup
should be fast since the first flush should remove it from the rstat
updated list (unless updates come from another css on the same cgroup
while the blkgs are being destroyed).

I am assuming in this case we are sure that the cgroup does not have
children, otherwise we need to flush them as well, not just the
cgroup, so calling __blkcg_rstat_flush() directly might not be a
viable option.

Perhaps we can add an API to cgroup rstat flushing that just
holds/releases the rstat lock, to avoid adding one extra lock on the
blkcg side that does the same job.

Alternatively, we can have a version of cgroup_rstat_flush() that only
flushes one subsystem; but then we will have to iterate the cgroups on
the rstat updated tree without popping them. Another approach would be
to split the rstat updated tree to have one per-subsystem so that they
don't slow each other down; but then we need to figure out what to do
about base stat flush and BPF flushing. Just thinking out loud here :)

>
> Cheers,
> Longman
>