Re: [PATCH] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()

李磊 <noctis.akm@xxxxxxxxx> · Sat, 9 Apr 2022 22:57:36 +0800

>
> On 4/9/22 2:58 PM, 李磊 wrote:
> >> The kworker routine update_writeback_rate() is schedued to update the
> >> writeback rate in every 5 seconds by default. Before calling
> >> __update_writeback_rate() to do real job, semaphore dc->writeback_lock
> >> should be held by the kworker routine.
> >>
> >> At the same time, bcache writeback thread routine bch_writeback_thread()
> >> also needs to hold dc->writeback_lock before flushing dirty data back
> >> into the backing device. If the dirty data set is large, it might be
> >> very long time for bch_writeback_thread() to scan all dirty buckets and
> >> releases dc->writeback_lock.
> > Hi Coly,
> > cached_dev_write() needs dc->writeback_lock, if  the writeback thread
> >   holds writeback_lock too long, high write IO latency may happen. I wonder
>
>  From my observation, such situation happens in one of the last scan
> before all dirty data gets flushed. If the cache device is very large,
> and dirty keys are only a few, such scan will take quite long time.
>
> It wasn't a problem years ago, but currently it is easy to have a 10TB+
> cache device, now the latency is observed.
>
>
> > if it is a nicer way to limit the scale of the scanning in writeback.
> > For example,
> > just scan 512GB in stead of the whole cache disk。
>
> Scan each 512GB space doesn't help too much. Because current btree
> iteration code doesn't handle continue-from-where-stopped very well,
> next time when continue form where stoppped, the previous key might be
> invalided already.
>
> An ideal way, might be split the single large btree into multiple ones.
> People suggested me to divide the single tree into e.g. 64 or 128 trees,
> and only lock a single tree when doing writeback on gc on one of the
> trees. Maybe now it is about time to think over it again...
>

I got it. Thanks for the detailed explanation.