Re: [PATCH v2] bcache: allow allocator to invalidate bucket in gc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Robert,

Thanks for your email.

> 2024年3月16日 06:45,Robert Pang <robertpang@xxxxxxxxxx> 写道:
> 
> Hi all
> 
> We found this patch via google.
> 
> We have a setup that uses bcache to cache a network attached storage in a local SSD drive. Under heavy traffic, IO on the cached device stalls every hour or so for tens of seconds. When we track the latency with "fio" utility continuously, we can see the max IO latency shoots up when stall happens,  
> 
> latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:14:18 2024
>  read: IOPS=62.3k, BW=486MiB/s (510MB/s)(11.4GiB/24000msec)
>    slat (nsec): min=1377, max=98964, avg=4567.31, stdev=1330.69
>    clat (nsec): min=367, max=43682, avg=429.77, stdev=234.70
>     lat (nsec): min=1866, max=105301, avg=5068.60, stdev=1383.14
>    clat percentiles (nsec):
>     |  1.00th=[  386],  5.00th=[  406], 10.00th=[  406], 20.00th=[  410],
>     | 30.00th=[  414], 40.00th=[  414], 50.00th=[  414], 60.00th=[  418],
>     | 70.00th=[  418], 80.00th=[  422], 90.00th=[  426], 95.00th=[  462],
>     | 99.00th=[  652], 99.50th=[  708], 99.90th=[ 3088], 99.95th=[ 5600],
>     | 99.99th=[11328]
>   bw (  KiB/s): min=318192, max=627591, per=99.97%, avg=497939.04, stdev=81923.63, samples=47
>   iops        : min=39774, max=78448, avg=62242.15, stdev=10240.39, samples=47
> ...
> 
> <IO stall>
> 
> latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:21:23 2024
>  read: IOPS=26.0k, BW=203MiB/s (213MB/s)(89.1GiB/448867msec)
>    slat (nsec): min=958, max=40745M, avg=15596.66, stdev=13650543.09
>    clat (nsec): min=364, max=104599, avg=435.81, stdev=302.81
>     lat (nsec): min=1416, max=40745M, avg=16104.06, stdev=13650546.77
>    clat percentiles (nsec):
>     |  1.00th=[  378],  5.00th=[  390], 10.00th=[  406], 20.00th=[  410],
>     | 30.00th=[  414], 40.00th=[  414], 50.00th=[  418], 60.00th=[  418],
>     | 70.00th=[  418], 80.00th=[  422], 90.00th=[  426], 95.00th=[  494],
>     | 99.00th=[  772], 99.50th=[  916], 99.90th=[ 3856], 99.95th=[ 5920],
>     | 99.99th=[10816]
>   bw (  KiB/s): min=    1, max=627591, per=100.00%, avg=244393.77, stdev=103534.74, samples=765
>   iops        : min=    0, max=78448, avg=30549.06, stdev=12941.82, samples=765
> 
> When we track per-second max latency in fio, we see something like this:
> 
> <time-ms>,<max-latency-ns>,,,
> ...
> 777000, 5155548, 0, 0, 0
> 778000, 105551, 1, 0, 0
> 802615, 24276019570, 0, 0, 0
> 802615, 82134, 1, 0, 0
> 804000, 9944554, 0, 0, 0
> 805000, 7424638, 1, 0, 0
> 
> fio --time_based --runtime=3600s --ramp_time=2s --ioengine=libaio --name=latency_test --filename=fio --bs=8k --iodepth=1 --size=900G  --readwrite=randrw --verify=0 --filename=fio --write_lat_log=lat --log_avg_msec=1000 --log_max_value=1
> 
> We saw a smiliar issue reported in https://www.spinics.net/lists/linux-bcache/msg09578.html, which suggests an issue in garbage collection. When we trigger GC manually via "echo 1 > /sys/fs/bcache/a356bdb0-...-64f794387488/internal/trigger_gc", the stall is always reproduced. That thread points to this patch (https://www.spinics.net/lists/linux-bcache/msg08870.html) that we tested and the stall no longer happens.
> 
> AFAIK, this patch marks buckets reclaimable at the beginning of GC to unblock the allocator so it does not need to wait for GC to finish. This periodic stall is a serious issue. Can the community look at this issue and this patch if possible?
> 

Could you please share more performance information of this patch? And how many nodes/how long time does the test cover so far?

Last time I test the patch, it looked fine. But I was not confident how large scale and how long time this patch was tested. If you may provide more testing information, it will be helpful.
 

Coly Li




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux