Re: [PATCH 2/3] bcache: call force_wake_up_gc() if necessary in check_should_bypass()

Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> · Mon, 27 May 2024 15:31:34 -0700 (PDT)

On Tue, 28 May 2024, Coly Li wrote:

> If there are extreme heavy write I/O continuously hit on relative small
> cache device (512GB in my testing), it is possible to make counter
> c->gc_stats.in_use continue to increase and exceed CUTOFF_CACHE_ADD.
> 
> If 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, all following write
> requests will bypass the cache device because check_should_bypass()
> returns 'true'. Because all writes bypass the cache device, counter
> c->sectors_to_gc has no chance to be negative value, and garbage
> collection thread won't be waken up even the whole cache becomes clean
> after writeback accomplished. The aftermath is that all write I/Os go
> directly into backing device even the cache device is clean.
> 
> To avoid the above situation, this patch uses a quite conservative way
> to fix: if 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, only wakes
> up garbage collection thread when the whole cache device is clean.

Nice fix.

If I understand correctly, even with this fix, bcache can reach a point 
where it must wait until garbage collection frees a bucket (via 
force_wake_up_gc) before buckets can be used again.  Waiting to call 
force_wake_up_gc until `c->gc_stats.in_use` exceeds CUTOFF_CACHE_ADD may 
not respond as fast as it could, and IO latency is important.

It may be a good idea to do `c->gc_stats.in_use > CUTOFF_CACHE_ADD/2` to
start garbage collection when it is half-way "full".

Reaching 50% is still quite conservative, but if you want to wait longer, 
then even 80% or 90% would be fine; however, I think 100% is too far.  We 
want to avoid the case where bcache is completely "out" of buckets and we 
have to wait for garbage collection latency before a cache bucket can 
fill, since buckets should be available.

For example on our system we have 736824 buckets available:
	# cat /sys/devices/virtual/block/dm-9/bcache/nbuckets
	736824

There should be no reason to wait until all buckets are exhausted. Forcing 
garbage collection at 50% (368412 buckets "in use") would be good house 
keeping.

You know this code very well so if I have misinterpreted something here, 
then please fill me in on the details.

--
Eric Wheeler

> 
> Before the fix, the writes-always-bypass situation happens after 10+
> hours write I/O pressure on 512GB Intel optane memory which acts as
> cache device. After this fix, such situation doesn't happen after 36+
> hours testing.
> 
> Signed-off-by: Coly Li <colyli@xxxxxxx>
> ---
>  drivers/md/bcache/request.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 83d112bd2b1c..af345dc6fde1 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -369,10 +369,24 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio)
>  	struct io *i;
>  
>  	if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) ||
> -	    c->gc_stats.in_use > CUTOFF_CACHE_ADD ||
>  	    (bio_op(bio) == REQ_OP_DISCARD))
>  		goto skip;
>  
> +	if (c->gc_stats.in_use > CUTOFF_CACHE_ADD) {
> +		/*
> +		 * If cached buckets are all clean now, 'true' will be
> +		 * returned and all requests will bypass the cache device.
> +		 * Then c->sectors_to_gc has no chance to be negative, and
> +		 * gc thread won't wake up and caching won't work forever.
> +		 * Here call force_wake_up_gc() to avoid such aftermath.
> +		 */
> +		if (BDEV_STATE(&dc->sb) == BDEV_STATE_CLEAN &&
> +		    c->gc_mark_valid)
> +			force_wake_up_gc(c);
> +
> +		goto skip;
> +	}
> +
>  	if (mode == CACHE_MODE_NONE ||
>  	    (mode == CACHE_MODE_WRITEAROUND &&
>  	     op_is_write(bio_op(bio))))
> -- 
> 2.35.3
> 
> 
>