On 5/28/22 6:22 AM, Coly Li wrote: > > >> 2022?5?28? 20:20?Jens Axboe <axboe@xxxxxxxxx> ??? >> >> On 5/28/22 12:19 AM, Coly Li wrote: >>> The kworker routine update_writeback_rate() is schedued to update the >>> writeback rate in every 5 seconds by default. Before calling >>> __update_writeback_rate() to do real job, semaphore dc->writeback_lock >>> should be held by the kworker routine. >>> >>> At the same time, bcache writeback thread routine bch_writeback_thread() >>> also needs to hold dc->writeback_lock before flushing dirty data back >>> into the backing device. If the dirty data set is large, it might be >>> very long time for bch_writeback_thread() to scan all dirty buckets and >>> releases dc->writeback_lock. In such case update_writeback_rate() can be >>> starved for long enough time so that kernel reports a soft lockup warn- >>> ing started like: >>> watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713] >>> >>> Such soft lockup condition is unnecessary, because after the writeback >>> thread finishes its job and releases dc->writeback_lock, the kworker >>> update_writeback_rate() may continue to work and everything is fine >>> indeed. >>> >>> This patch avoids the unnecessary soft lockup by the following method, >>> - Add new member to struct cached_dev >>> - dc->rate_update_retry (0 by default) >>> - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock) >>> firstly, if it fails then lock contention happens. >>> - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't >>> acquire the lock and reschedules the kworker for next try. >>> - If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry >>> anymore and call down_read(&dc->writeback_lock) to wait for the lock. >>> >>> By the above method, at worst case update_writeback_rate() may retry for >>> 1+ minutes before blocking on dc->writeback_lock by calling down_read(). >>> For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft >>> lockup warning message can be avoided. >>> >>> When retrying to acquire dc->writeback_lock in update_writeback_rate(), >>> of course the writeback rate cannot be updated. It is fair, because when >>> the kworker is blocked on the lock contention of dc->writeback_lock, the >>> writeback rate cannot be updated neither. >>> >>> This change follows Jens Axboe's suggestion to a more clear and simple >>> version. >> >> This looks fine, but it doesn't apply to my current for-5.19/drivers >> branch which the previous ones did. Did you spin this one without the >> other patches, perhaps? >> >> One minor thing we might want to change if you're respinning it - >> BCH_WBRATE_UPDATE_RETRY_MAX isn't really named for what it does, since >> it doesn't retry anything, it simply allows updates to be skipped. Why >> not call it BCH_WBRATE_UPDATE_MAX_SKIPS instead? I think that'd be >> better convey what it does. > > Naming is often challenge for me. Sure, _MAX_SKIPS is better. I will > post another modified version. It's hard for everyone :-) Sounds good, I'll get it applied when it shows up. -- Jens Axboe