Re: [PATCH 1/1] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()

Coly Li <colyli@xxxxxxx> · Sat, 28 May 2022 20:22:24 +0800

> 2022年5月28日 20:20，Jens Axboe <axboe@xxxxxxxxx> 写道：
> 
> On 5/28/22 12:19 AM, Coly Li wrote:
>> The kworker routine update_writeback_rate() is schedued to update the
>> writeback rate in every 5 seconds by default. Before calling
>> __update_writeback_rate() to do real job, semaphore dc->writeback_lock
>> should be held by the kworker routine.
>> 
>> At the same time, bcache writeback thread routine bch_writeback_thread()
>> also needs to hold dc->writeback_lock before flushing dirty data back
>> into the backing device. If the dirty data set is large, it might be
>> very long time for bch_writeback_thread() to scan all dirty buckets and
>> releases dc->writeback_lock. In such case update_writeback_rate() can be
>> starved for long enough time so that kernel reports a soft lockup warn-
>> ing started like:
>>  watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713]
>> 
>> Such soft lockup condition is unnecessary, because after the writeback
>> thread finishes its job and releases dc->writeback_lock, the kworker
>> update_writeback_rate() may continue to work and everything is fine
>> indeed.
>> 
>> This patch avoids the unnecessary soft lockup by the following method,
>> - Add new member to struct cached_dev
>>  - dc->rate_update_retry (0 by default)
>> - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock)
>>  firstly, if it fails then lock contention happens.
>> - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't
>>  acquire the lock and reschedules the kworker for next try.
>> - If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry
>>  anymore and call down_read(&dc->writeback_lock) to wait for the lock.
>> 
>> By the above method, at worst case update_writeback_rate() may retry for
>> 1+ minutes before blocking on dc->writeback_lock by calling down_read().
>> For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft
>> lockup warning message can be avoided.
>> 
>> When retrying to acquire dc->writeback_lock in update_writeback_rate(),
>> of course the writeback rate cannot be updated. It is fair, because when
>> the kworker is blocked on the lock contention of dc->writeback_lock, the
>> writeback rate cannot be updated neither.
>> 
>> This change follows Jens Axboe's suggestion to a more clear and simple
>> version.
> 
> This looks fine, but it doesn't apply to my current for-5.19/drivers
> branch which the previous ones did. Did you spin this one without the
> other patches, perhaps?
> 
> One minor thing we might want to change if you're respinning it -
> BCH_WBRATE_UPDATE_RETRY_MAX isn't really named for what it does, since
> it doesn't retry anything, it simply allows updates to be skipped. Why
> not call it BCH_WBRATE_UPDATE_MAX_SKIPS instead? I think that'd be
> better convey what it does.

Naming is often challenge for me. Sure, _MAX_SKIPS is better. I will post another modified version.

Thanks for the suggestion.

Coly Li