On Tue, 28 May 2024, Coly Li wrote: > If there are extreme heavy write I/O continuously hit on relative small > cache device (512GB in my testing), it is possible to make counter > c->gc_stats.in_use continue to increase and exceed CUTOFF_CACHE_ADD. > > If 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, all following write > requests will bypass the cache device because check_should_bypass() > returns 'true'. Because all writes bypass the cache device, counter > c->sectors_to_gc has no chance to be negative value, and garbage > collection thread won't be waken up even the whole cache becomes clean > after writeback accomplished. The aftermath is that all write I/Os go > directly into backing device even the cache device is clean. > > To avoid the above situation, this patch uses a quite conservative way > to fix: if 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, only wakes > up garbage collection thread when the whole cache device is clean. Nice fix. If I understand correctly, even with this fix, bcache can reach a point where it must wait until garbage collection frees a bucket (via force_wake_up_gc) before buckets can be used again. Waiting to call force_wake_up_gc until `c->gc_stats.in_use` exceeds CUTOFF_CACHE_ADD may not respond as fast as it could, and IO latency is important. It may be a good idea to do `c->gc_stats.in_use > CUTOFF_CACHE_ADD/2` to start garbage collection when it is half-way "full". Reaching 50% is still quite conservative, but if you want to wait longer, then even 80% or 90% would be fine; however, I think 100% is too far. We want to avoid the case where bcache is completely "out" of buckets and we have to wait for garbage collection latency before a cache bucket can fill, since buckets should be available. For example on our system we have 736824 buckets available: # cat /sys/devices/virtual/block/dm-9/bcache/nbuckets 736824 There should be no reason to wait until all buckets are exhausted. Forcing garbage collection at 50% (368412 buckets "in use") would be good house keeping. You know this code very well so if I have misinterpreted something here, then please fill me in on the details. -- Eric Wheeler > > Before the fix, the writes-always-bypass situation happens after 10+ > hours write I/O pressure on 512GB Intel optane memory which acts as > cache device. After this fix, such situation doesn't happen after 36+ > hours testing. > > Signed-off-by: Coly Li <colyli@xxxxxxx> > --- > drivers/md/bcache/request.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > index 83d112bd2b1c..af345dc6fde1 100644 > --- a/drivers/md/bcache/request.c > +++ b/drivers/md/bcache/request.c > @@ -369,10 +369,24 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) > struct io *i; > > if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) || > - c->gc_stats.in_use > CUTOFF_CACHE_ADD || > (bio_op(bio) == REQ_OP_DISCARD)) > goto skip; > > + if (c->gc_stats.in_use > CUTOFF_CACHE_ADD) { > + /* > + * If cached buckets are all clean now, 'true' will be > + * returned and all requests will bypass the cache device. > + * Then c->sectors_to_gc has no chance to be negative, and > + * gc thread won't wake up and caching won't work forever. > + * Here call force_wake_up_gc() to avoid such aftermath. > + */ > + if (BDEV_STATE(&dc->sb) == BDEV_STATE_CLEAN && > + c->gc_mark_valid) > + force_wake_up_gc(c); > + > + goto skip; > + } > + > if (mode == CACHE_MODE_NONE || > (mode == CACHE_MODE_WRITEAROUND && > op_is_write(bio_op(bio)))) > -- > 2.35.3 > > >