On 4/9/22 2:58 PM, 李磊 wrote:
The kworker routine update_writeback_rate() is schedued to update the
writeback rate in every 5 seconds by default. Before calling
__update_writeback_rate() to do real job, semaphore dc->writeback_lock
should be held by the kworker routine.
At the same time, bcache writeback thread routine bch_writeback_thread()
also needs to hold dc->writeback_lock before flushing dirty data back
into the backing device. If the dirty data set is large, it might be
very long time for bch_writeback_thread() to scan all dirty buckets and
releases dc->writeback_lock.
Hi Coly,
cached_dev_write() needs dc->writeback_lock, if the writeback thread
holds writeback_lock too long, high write IO latency may happen. I wonder
From my observation, such situation happens in one of the last scan
before all dirty data gets flushed. If the cache device is very large,
and dirty keys are only a few, such scan will take quite long time.
It wasn't a problem years ago, but currently it is easy to have a 10TB+
cache device, now the latency is observed.
if it is a nicer way to limit the scale of the scanning in writeback.
For example,
just scan 512GB in stead of the whole cache disk。
Scan each 512GB space doesn't help too much. Because current btree
iteration code doesn't handle continue-from-where-stopped very well,
next time when continue form where stoppped, the previous key might be
invalided already.
An ideal way, might be split the single large btree into multiple ones.
People suggested me to divide the single tree into e.g. 64 or 128 trees,
and only lock a single tree when doing writeback on gc on one of the
trees. Maybe now it is about time to think over it again...
Coly Li