Re: [PATCH 2/3] bcache: call force_wake_up_gc() if necessary in check_should_bypass()

Coly Li <colyli@xxxxxxx> · Thu, 30 May 2024 00:23:52 +0800

> 2024年5月29日 08:16，Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> 写道：
> 
> On Tue, 28 May 2024, Coly Li wrote:
>>> 2024年5月28日 06:31，Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> 写道：
>>> 
>>> On Tue, 28 May 2024, Coly Li wrote:
>>> 
>>>> If there are extreme heavy write I/O continuously hit on relative small
>>>> cache device (512GB in my testing), it is possible to make counter
>>>> c->gc_stats.in_use continue to increase and exceed CUTOFF_CACHE_ADD.
>>>> 
>>>> If 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, all following write
>>>> requests will bypass the cache device because check_should_bypass()
>>>> returns 'true'. Because all writes bypass the cache device, counter
>>>> c->sectors_to_gc has no chance to be negative value, and garbage
>>>> collection thread won't be waken up even the whole cache becomes clean
>>>> after writeback accomplished. The aftermath is that all write I/Os go
>>>> directly into backing device even the cache device is clean.
>>>> 
>>>> To avoid the above situation, this patch uses a quite conservative way
>>>> to fix: if 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, only wakes
>>>> up garbage collection thread when the whole cache device is clean.
>>> 
>>> Nice fix.
>>> 
>>> If I understand correctly, even with this fix, bcache can reach a point 
>>> where it must wait until garbage collection frees a bucket (via 
>>> force_wake_up_gc) before buckets can be used again.  Waiting to call 
>>> force_wake_up_gc until `c->gc_stats.in_use` exceeds CUTOFF_CACHE_ADD may 
>>> not respond as fast as it could, and IO latency is important.
>>> 
>> 
>> CUTOFF_CACHE_ADD is not for this purpose.
>> GC is triggered by c->sectors_to_gc, it works as
>> - initialized as 1/16 size of cache device.
>> - every allocation decreases cached size from it.
>> - once c->sectors_go_gc is negative value, wakeup gc thread and reset the value to 1/16 size of cache device.
>> 
>> CUTOFF_CACHE_ADD is to avoid something like no-space deadlock in cache 
>> space. If cache space is allocated more than CUTOFF_CACHE_ADD (95%), 
>> cache space will not be allocated out anymore and all read/write will 
>> bypass and go directly into backing device. In my testing, after 10+ 
>> hours I can see c->gc_stats.in_use is 96%. Which is a bit more than 95%, 
>> but c->sectors_go_gc is still larger than 0. This is how the 
>> forever-bypass happens. It has nothing to do with the latency of neither 
>> I/O nor gc.
> 
> Understood, thank you for the explanation!
> 
> You said that this bug exists an older version even though it is difficult 
> trigger. Perhaps it is a good idea to CC stable:
> 
> Cc: stable@xxxxxxxxxxxxxxx
> 

It is unnecessary to Cc stable, manually writing sysfs file trigger_gc can avoid the always-bypass situation.
This patch is to make the default configuration works without extra operation.

> Also, 
> Reviewed-by: "Eric Wheeler" <bcache@xxxxxxxxxxxxxxxxxx>

Thanks for the review. Maybe it is a bit late to add the tag, I just found this series was in linux-block tree already and will show up in next -rc quite soon.

Thanks.

Coly Li

[snipped]