> 2024年4月11日 14:44,Robert Pang <robertpang@xxxxxxxxxx> 写道: > > HI Coly > > Thank you for submitting it in the next merge window. This patch is > very critical because the long IO stall measured in tens of seconds > every hour is a serious issue making bcache unusable when it happens. > So we look forward to this patch. > > Speaking of this GC issue, we gathered the bcache btree GC stats after > our fio benchmark on a 375GB SSD cache device with 256kB bucket size: > > $ grep . /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_* > /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_average_duration_ms:45293 > /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_average_frequency_sec:286 > /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_last_sec:212 > /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_max_duration_ms:61986 > $ more /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_nodes > 5876 > > However, fio directly on the SSD device itself shows pretty good performance: > > Read IOPS 14,100 (110MiB/s) > Write IOPS 42,200 (330MiB/s) > Latency: 106.64 microseconds > > Can you shed some light on why CG takes so long (avg 45 seconds) given > the SSD speed? And is there any way or setting to reduce the CG time > or lower the GC frequency? > > One interesting thing we observed is when the SSD is encrypted via > dm-crypt, the GC time is shortened ~80% to be under 10 seconds. Is it > possible that GC writes the blocks one-by-one synchronously, and > dm-crypt's internal queuing and buffering mitigates the GC IO latency? Hi Robert, Can I know In which kernel version did you test the patch? I do a patch rebase and apply it on Linux v6.9. With a 4TB SSD as cache device, I didn’t observe obvious performance advantage of this patch. And occasionally I a bit more GC time. It might be from my rebase modification in bch_btree_gc_finish(), @@ -1769,6 +1771,11 @@ static void bch_btree_gc_finish(struct cache_set *c) c->gc_mark_valid = 1; c->need_gc = 0; + ca = c->cache; + for_each_bucket(b, ca) + if (b->reclaimable_in_gc) + b->reclaimable_in_gc = 0; + for (i = 0; i < KEY_PTRS(&c->uuid_bucket); i++) SET_GC_MARK(PTR_BUCKET(c, &c->uuid_bucket, i), GC_MARK_METADATA); for_each_bucket() runs twice in bch_btree_gc_finish(). I guess maybe it is not exactly relevant to the GC time floating, but iterating all buckets twice in this patch looks a bit comfortable to me. Hi Dongsheng, Maybe my rebase is incorrect. Could you please post a new version which applies to the latest upstream bcache code? Thanks in advance. Coly Li