> > On 4/9/22 2:58 PM, 李磊 wrote: > >> The kworker routine update_writeback_rate() is schedued to update the > >> writeback rate in every 5 seconds by default. Before calling > >> __update_writeback_rate() to do real job, semaphore dc->writeback_lock > >> should be held by the kworker routine. > >> > >> At the same time, bcache writeback thread routine bch_writeback_thread() > >> also needs to hold dc->writeback_lock before flushing dirty data back > >> into the backing device. If the dirty data set is large, it might be > >> very long time for bch_writeback_thread() to scan all dirty buckets and > >> releases dc->writeback_lock. > > Hi Coly, > > cached_dev_write() needs dc->writeback_lock, if the writeback thread > > holds writeback_lock too long, high write IO latency may happen. I wonder > > From my observation, such situation happens in one of the last scan > before all dirty data gets flushed. If the cache device is very large, > and dirty keys are only a few, such scan will take quite long time. > > It wasn't a problem years ago, but currently it is easy to have a 10TB+ > cache device, now the latency is observed. > > > > if it is a nicer way to limit the scale of the scanning in writeback. > > For example, > > just scan 512GB in stead of the whole cache disk。 > > Scan each 512GB space doesn't help too much. Because current btree > iteration code doesn't handle continue-from-where-stopped very well, > next time when continue form where stoppped, the previous key might be > invalided already. > > An ideal way, might be split the single large btree into multiple ones. > People suggested me to divide the single tree into e.g. 64 or 128 trees, > and only lock a single tree when doing writeback on gc on one of the > trees. Maybe now it is about time to think over it again... > I got it. Thanks for the detailed explanation.