Re: [PATCH v2] bcache: allow allocator to invalidate bucket in gc

Robert Pang <robertpang@xxxxxxxxxx> · Wed, 10 Apr 2024 23:44:11 -0700

HI Coly

Thank you for submitting it in the next merge window. This patch is
very critical because the long IO stall measured in tens of seconds
every hour is a serious issue making bcache unusable when it happens.
So we look forward to this patch.

Speaking of this GC issue, we gathered the bcache btree GC stats after
our fio benchmark on a 375GB SSD cache device with 256kB bucket size:

$ grep . /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_*
/sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_average_duration_ms:45293
/sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_average_frequency_sec:286
/sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_last_sec:212
/sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_gc_max_duration_ms:61986
$ more /sys/fs/bcache/31c945a7-d96c-499b-945c-d76a1ab0beda/internal/btree_nodes
5876

However, fio directly on the SSD device itself shows pretty good performance:

Read IOPS 14,100 (110MiB/s)
Write IOPS 42,200 (330MiB/s)
Latency: 106.64 microseconds

Can you shed some light on why CG takes so long (avg 45 seconds) given
the SSD speed? And is there any way or setting to reduce the CG time
or lower the GC frequency?

One interesting thing we observed is when the SSD is encrypted via
dm-crypt, the GC time is shortened ~80% to be under 10 seconds. Is it
possible that GC writes the blocks one-by-one synchronously, and
dm-crypt's internal queuing and buffering mitigates the GC IO latency?

Thanks
Robert

On Fri, Mar 29, 2024 at 6:00 AM Coly Li <colyli@xxxxxxx> wrote:
>
>
>
> > 2024年3月29日 02:05，Robert Pang <robertpang@xxxxxxxxxx> 写道：
> >
> > Hi bcache developers
> >
> > Greetings. Any update on this patch? How are things going with the
> > testing and submission upstream?
>
> Hi Peng,
>
> As I said, it will be in next merge window, not this one. If there is help necessary, I will ask :-)
>
> Thanks.
>
> Coly Li
>
>
> >
> >
> > On Sun, Mar 17, 2024 at 11:16 PM Robert Pang <robertpang@xxxxxxxxxx> wrote:
> >>
> >> Hi Coly
> >>
> >> Thank you for confirming. It looks like the 6.9 merge window just
> >> opened last week so we hope it can catch it. Please update in this
> >> thread when it gets submitted.
> >>
> >> https://lore.kernel.org/lkml/CAHk-=wiehc0DfPtL6fC2=bFuyzkTnuiuYSQrr6JTQxQao6pq1Q@xxxxxxxxxxxxxx/T/
> >>
> >> BTW, speaking of testing, mind if you point us to the bcache test
> >> suite? We would like to have a look and maybe give it a try also.
> >>
> >> Thanks
> >> Robert
> >>
> >> On Sun, Mar 17, 2024 at 7:00 AM Coly Li <colyli@xxxxxxx> wrote:
> >>>
> >>>
> >>>
> >>>> 2024年3月17日 13:41，Robert Pang <robertpang@xxxxxxxxxx> 写道：
> >>>>
> >>>> Hi Coly
> >>>>
> >>>
> >>> Hi Robert,
> >>>
> >>>> Thank you for looking into this issue.
> >>>>
> >>>> We tested this patch in 5 machines with local SSD size ranging from
> >>>> 375 GB to 9 TB, and ran tests for 10 to 12 hours each. We observed no
> >>>> stall nor other issues. Performance was comparable before and after
> >>>> the patch. Hope this info will be helpful.
> >>>
> >>> Thanks for the information.
> >>>
> >>> Also I was told this patch has been deployed and shipped for 1+ year in easystack products, works well.
> >>>
> >>> The above information makes me feel confident for this patch. I will submit it in next merge window if some ultra testing loop passes.
> >>>
> >>> Coly Li
> >>>
> >
>
> [snipped]
>