Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx> · Sat, 24 Feb 2018 09:45:49 +0800

Hi Tejun,

On 18/2/23 22:23, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 23, 2018 at 09:56:54AM +0800, xuejiufei wrote:
>>> On Thu, Feb 22, 2018 at 02:14:34PM +0800, Joseph Qi wrote:
>>>> I still don't get how css_tryget can work here.
>>>>
>>>> The race happens when:
>>>> 1) writeback kworker has found the blkg with rcu;
>>>> 2) blkcg is during offlining and blkg_destroy() has already been called.
>>>> Then, writeback kworker will take queue lock and access the blkg with
>>>> refcount 0.
>>>
>>> Yeah, then tryget would fail and it should go through the root.
>>>
>> In this race, the refcount of blkg becomes zero and is destroyed.
>> However css may still have refcount, and css_tryget can return success
>> before other callers put the refcount.
>> So I don't get how css_tryget can fix this race? Or I wonder if we can
>> add another function blkg_tryget?
> 
> IIRC, as long as the blkcg and the device are there, the blkgs aren't
> gonna be destroyed.  So, if you have a ref to the blkcg through
> tryget, the blkg shouldn't go away.
> 

Maybe we have misunderstanding here.

In this case, blkg doesn't go away as we have rcu protect, but
blkg_destroy() can be called, in which blkg_put() will put the last
refcnt and then schedule __blkg_release_rcu().

css refcnt can't prevent blkcg css from offlining, instead it is css
online_cnt.

css_tryget() will only get a refcnt of blkcg css, but can't be
guaranteed to fail when css is confirmed to kill.

The race sequence:
writeback kworker                   cgroup_rmdir
                                      cgroup_destroy_locked
                                        kill_css
                                          css_killed_ref_fn
                                            css_killed_work_fn
                                              offline_css
                                                blkcg_css_offline
  blkcg_bio_issue_check
    rcu_read_lock
    blkg_lookup
                                                  spin_trylock(q->queue_lock)
                                                  blkg_destroy
                                                  spin_unlock(q->queue_lock)
    blk_throtl_bio
      spin_lock_irq(q->queue_lock)
      spin_unlock_irq(q->queue_lock)
    rcu_read_unlock

Thanks,
Joseph
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html