Re: [PATCH v2] bcache: allow allocator to invalidate bucket in gc

Coly Li <colyli@xxxxxxx> · Sat, 4 May 2024 02:23:09 +0800





> 2024年4月11日 14:44，Robert Pang <robertpang@xxxxxxxxxx> 写道：
> 
> HI Coly
> 
> Thank you for submitting it in the next merge window. This patch is
> very critical because the long IO stall measured in tens of seconds
> every hour is a serious issue making bcache unusable when it happens.
> So we look forward to this patch.
> 
> Speaking of this GC issue, we gathered the bcache btree GC stats after
> our fio benchmark on a 375GB SSD cache device with 256kB bucket size:
> 
> $ grep . /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_*
> /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_average_duration_ms:45293
> /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_average_frequency_sec:286
> /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_last_sec:212
> /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_max_duration_ms:61986
> $ more /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_nodes
> 5876
> 
> However, fio directly on the SSD device itself shows pretty good performance:
> 
> Read IOPS 14,100 (110MiB/s)
> Write IOPS 42,200 (330MiB/s)
> Latency: 106.64 microseconds
> 
> Can you shed some light on why CG takes so long (avg 45 seconds) given
> the SSD speed? And is there any way or setting to reduce the CG time
> or lower the GC frequency?
> 
> One interesting thing we observed is when the SSD is encrypted via
> dm-crypt, the GC time is shortened ~80% to be under 10 seconds. Is it
> possible that GC writes the blocks one-by-one synchronously, and
> dm-crypt's internal queuing and buffering mitigates the GC IO latency?

Hi Robert,

Can I know In which kernel version did you test the patch?

I do a patch rebase and apply it on Linux v6.9. With a 4TB SSD as cache device, I didn’t observe obvious performance advantage of this patch.
And occasionally I a bit more GC time. It might be from my rebase modification in bch_btree_gc_finish(),
@@ -1769,6 +1771,11 @@ static void bch_btree_gc_finish(struct cache_set *c)
        c->gc_mark_valid = 1;
        c->need_gc      = 0;

+       ca = c->cache;
+       for_each_bucket(b, ca)
+               if (b->reclaimable_in_gc)
+                       b->reclaimable_in_gc = 0;
+
        for (i = 0; i < KEY_PTRS(&c->uuid_bucket); i++)
                SET_GC_MARK(PTR_BUCKET(c, &c->uuid_bucket, i),
                            GC_MARK_METADATA);

for_each_bucket() runs twice in bch_btree_gc_finish(). I guess maybe it is not exactly relevant to the GC time floating, but iterating all buckets twice in this patch looks a bit comfortable to me.


Hi Dongsheng,

Maybe my rebase is incorrect. Could you please post a new version which applies to the latest upstream bcache code?

Thanks in advance.


Coly Li