On Wed, May 24, 2023 at 8:11 PM Waiman Long <longman@xxxxxxxxxx> wrote: > > On 5/24/23 23:04, Waiman Long wrote: > >> Hi Waiman, > >> > >> I don't have context about blkcg, but isn't this exactly what > >> cgroup_rstat_lock does? Is it too expensive to just call > >> cgroup_rstat_flush () here? > > > > I have thought about that too. However, in my test, just calling > > cgroup_rstat_flush() in blkcg_destroy_blkgs() did not prevent dying > > blkcgs from increasing meaning that there are still some extra > > references blocking its removal. I haven't figured out exactly why > > that is the case. There may still be some races that we have not fully > > understood yet. On the other hand, Ming's patch is verified to not do > > that since it does not take extra blkg references. So I am leaning on > > his patch now. I just have to make sure that there is no concurrent > > rstat update. > > It is also true that calling cgroup_rstat_flush() is much more > expensive, especially at the blkg level where there can be many blkgs to > be destroyed when a blkcg is destroyed. I see. For what's it worth, consequent flushes on the same cgroup should be fast since the first flush should remove it from the rstat updated list (unless updates come from another css on the same cgroup while the blkgs are being destroyed). I am assuming in this case we are sure that the cgroup does not have children, otherwise we need to flush them as well, not just the cgroup, so calling __blkcg_rstat_flush() directly might not be a viable option. Perhaps we can add an API to cgroup rstat flushing that just holds/releases the rstat lock, to avoid adding one extra lock on the blkcg side that does the same job. Alternatively, we can have a version of cgroup_rstat_flush() that only flushes one subsystem; but then we will have to iterate the cgroups on the rstat updated tree without popping them. Another approach would be to split the rstat updated tree to have one per-subsystem so that they don't slow each other down; but then we need to figure out what to do about base stat flush and BPF flushing. Just thinking out loud here :) > > Cheers, > Longman >