Re: Large latency with bcache for Ceph OSD(new mail thread)

"Norman.Kern" <norman.kern@xxxxxxx> · Mon, 8 Mar 2021 13:47:18 +0800



On 2021/3/5 下午6:03, Coly Li wrote:
> On 3/5/21 5:00 PM, Norman.Kern wrote:
>> On 2021/3/2 下午9:20, Coly Li wrote:
>>> On 3/2/21 6:20 PM, Norman.Kern wrote:
>>>> Sorry for creating a new mail thread(the origin is so long...)
>>>>
>>>>
>>>> I made a test again and get more infomation:
>>>>
>>>> root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data
>>>> 0.0k
>>>> root@WXS0089:~# lsblk /dev/sda
>>>> NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>>>> sda         8:0    0 447.1G  0 disk
>>>> `-bcache0 252:0    0  10.9T  0 disk
>>>> root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
>>>> Unused:         1%
>>>> Clean:          29%
>>>> Dirty:          70%
>>>> Metadata:       0%
>>>> Average:        49
>>>> Sectors per Q:  29184768
>>>> Quantiles:      [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203]
>>>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback
>>>> 1
>>>> You have new mail in /var/mail/root
>>>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent
>>>> 28
>>>>
>>>> I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right.
>>>>
>>> If gc_after_writeback is enabled, and after it is enabled and the cache
>>> usage > 50%, a tag BCH_DO_AUTO_GC will be set to c->gc_after_writeback.
>>> Then when the writeback completed the gc thread will wake up in force.
>>>
>>> so the auto gc after writeback will be triggered when,
>>> 1, the bcache device is in writeback mode
>>> 2, gc_after_writeback set to 1
>>> 3, After 2) done, the cache usage exceeds 50% threshold.
>>> 4, writeback rate set to maximum rate when the bcache device is idle (no
>>> regular I/O request)
>>> 5, after the writeback accomplished, the gc thread will be waken up.
>>>
>>> But /sys/block/bcache0/bcache/dirty_data is 0.0k doesn't mean the
>>> writeback is accomplished. It is possible the writeback thread still
>>> goes through all btree keys for the last try even all the dirty data are
>>> flushed. Therefore you should check whether the writeback thread is
>>> still active before a conclusion is made that the writeback is completed.
>>>
>>> BTW, do you try a Linux v5.8+ kernel and see how things are ?
>> I have test on 5.8.X,  but it doesn't help. I test on the same config on another server(480G SSD + 8T HDD),
>>
> What do you mean on "doesn't help" ?  Do you mean the force gc does not
> trigger, or something else.
The  cache_available_percent didn't reset to 100 automatically after all I/O done for a very long time. I must echo 1 to trigger_gc to help it recovered.
>
>> it can't reproduce, this really made me confused. I will compare the configs and try to find out the diffs.
> For which behavior that it don't reproduce ?
The problem of cache_available_percent not being recovered automatically.
>
> Thanks.
>
> Coly Li