Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tejun,
Thanks very much for reviewing this patch.

On 18/2/8 05:38, Tejun Heo wrote:
> Hello, Joseph.
> 
> On Wed, Feb 07, 2018 at 04:40:02PM +0800, Joseph Qi wrote:
>> writeback kworker
>>   blkcg_bio_issue_check
>>     rcu_read_lock
>>     blkg_lookup
>>     <<< *race window*
>>     blk_throtl_bio
>>       spin_lock_irq(q->queue_lock)
>>       spin_unlock_irq(q->queue_lock)
>>     rcu_read_unlock
>>
>> cgroup_rmdir
>>   cgroup_destroy_locked
>>     kill_css
>>       css_killed_ref_fn
>>         css_killed_work_fn
>>           offline_css
>>             blkcg_css_offline
>>               spin_trylock(q->queue_lock)
>>               blkg_destroy
>>               spin_unlock(q->queue_lock)
> 
> Ah, right.  Thanks for spotting the bug.
> 
>> Since rcu can only prevent blkg from releasing when it is being used,
>> the blkg->refcnt can be decreased to 0 during blkg_destroy and schedule
>> blkg release.
>> Then trying to blkg_get in blk_throtl_bio will complains the WARNING.
>> And then the corresponding blkg_put will schedule blkg release again,
>> which result in double free.
>> This race is introduced by commit ae1188963611 ("blkcg: consolidate blkg
>> creation in blkcg_bio_issue_check()"). Before this commit, it will lookup
>> first and then try to lookup/create again with queue_lock. So revive
>> this logic to fix the race.
> 
> The change seems a bit drastic to me.  Can't we do something like the
> following instead?
> 
> blk_throtl_bio()
> {
> 	... non throttled cases ...
> 
> 	/* out-of-limit, queue to @tg */
> 
> 	/*
> 	 * We can look up and retry but the race window is tiny here.
> 	 * Just letting it through should be good enough.
> 	 */
> 	if (!css_tryget(blkcg->css))
> 		goto out;
> 
> 	... actual queueing ...
> 	css_put(blkcg->css);
> 	...
> }
So you mean checking css->refcnt to prevent the further use of
blkg_get? I think it makes sense.
IMO, we should use css_tryget_online instead, and rightly after taking
queue_lock. Because there may be more use of blkg_get in blk_throtl_bio
in the futher. Actually it already has two now. One is in
blk_throtl_assoc_bio, and the other is in throtl_qnode_add_bio.
What do you think of this?

Thanks,
Joseph



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux