Hi, 在 2023/01/10 2:23, Tejun Heo 写道:
Yeah, that's unfortunate. There are several options here: 1. Do what you originally suggested - bypass to root after offline. I feel uneasy about this. Both iolatency and throtl clear their configs on offline but that's punting to the parent. For iocost it'd be bypassing all controls, which can actually be exploited. 2. Make all possible IO issuers use blkcg_[un]pin_online() and shift the iocost shutdown to pd_offline_fn(). This likely is the most canonical solution given the current situation but it's kinda nasty to add another layer of refcnting all over the place. 3. Order blkg free so that parents are never freed before children. You did this by adding refcnts in iocost but shouldn't it be possible to simply shift blkg_put(blkg->parent) in __blkg_release() to blkg_free_workfn()?
As I tried to explain before, we can make sure blkg_free() is called in order, but blkg_free() from remove cgroup can concurrent with deactivate policy, and we can't guarantee the order of ioc_pd_free() that is called both from blkg_free() and blkcg_deactivate_policy(). Hence I don't think #3 is possible. I personaly prefer #1, I don't see any real use case about the defect that you described, and actually in cgroup v1 blk-throtl is bypassed to no limit as well. I'm not sure about #2, that sounds a possible solution but I'm not quite familiar with the implementations here. Consider that bfq already has such refcounting for bfqg, perhaps similiar refcounting is acceptable? Thanks, Kuai