Re: [PATCH 2/3] bcache: call force_wake_up_gc() if necessary in check_should_bypass()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 28 May 2024, Coly Li wrote:
> > 2024年5月28日 06:31,Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> 写道:
> > 
> > On Tue, 28 May 2024, Coly Li wrote:
> > 
> >> If there are extreme heavy write I/O continuously hit on relative small
> >> cache device (512GB in my testing), it is possible to make counter
> >> c->gc_stats.in_use continue to increase and exceed CUTOFF_CACHE_ADD.
> >> 
> >> If 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, all following write
> >> requests will bypass the cache device because check_should_bypass()
> >> returns 'true'. Because all writes bypass the cache device, counter
> >> c->sectors_to_gc has no chance to be negative value, and garbage
> >> collection thread won't be waken up even the whole cache becomes clean
> >> after writeback accomplished. The aftermath is that all write I/Os go
> >> directly into backing device even the cache device is clean.
> >> 
> >> To avoid the above situation, this patch uses a quite conservative way
> >> to fix: if 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, only wakes
> >> up garbage collection thread when the whole cache device is clean.
> > 
> > Nice fix.
> > 
> > If I understand correctly, even with this fix, bcache can reach a point 
> > where it must wait until garbage collection frees a bucket (via 
> > force_wake_up_gc) before buckets can be used again.  Waiting to call 
> > force_wake_up_gc until `c->gc_stats.in_use` exceeds CUTOFF_CACHE_ADD may 
> > not respond as fast as it could, and IO latency is important.
> > 
> 
> CUTOFF_CACHE_ADD is not for this purpose.
> GC is triggered by c->sectors_to_gc, it works as
> - initialized as 1/16 size of cache device.
> - every allocation decreases cached size from it.
> - once c->sectors_go_gc is negative value, wakeup gc thread and reset the value to 1/16 size of cache device.
> 
> CUTOFF_CACHE_ADD is to avoid something like no-space deadlock in cache 
> space. If cache space is allocated more than CUTOFF_CACHE_ADD (95%), 
> cache space will not be allocated out anymore and all read/write will 
> bypass and go directly into backing device. In my testing, after 10+ 
> hours I can see c->gc_stats.in_use is 96%. Which is a bit more than 95%, 
> but c->sectors_go_gc is still larger than 0. This is how the 
> forever-bypass happens. It has nothing to do with the latency of neither 
> I/O nor gc.

Understood, thank you for the explanation!

You said that this bug exists an older version even though it is difficult 
trigger. Perhaps it is a good idea to CC stable:

	Cc: stable@xxxxxxxxxxxxxxx

Also, 
	Reviewed-by: "Eric Wheeler" <bcache@xxxxxxxxxxxxxxxxxx>

--
Eric Wheeler

> 
> 
> > It may be a good idea to do `c->gc_stats.in_use > CUTOFF_CACHE_ADD/2` to
> > start garbage collection when it is half-way "full".
> > 
> 
> No, it is not designed to work in this way. By the above change, all I/O will bypass the cache device and go directly into backing device when cache device is occupied only 50% space.
> 
> 
> 
> > Reaching 50% is still quite conservative, but if you want to wait longer, 
> > then even 80% or 90% would be fine; however, I think 100% is too far.  We 
> > want to avoid the case where bcache is completely "out" of buckets and we 
> > have to wait for garbage collection latency before a cache bucket can 
> > fill, since buckets should be available.
> > 
> > For example on our system we have 736824 buckets available:
> > # cat /sys/devices/virtual/block/dm-9/bcache/nbuckets
> > 736824
> > 
> > There should be no reason to wait until all buckets are exhausted. Forcing 
> > garbage collection at 50% (368412 buckets "in use") would be good house 
> > keeping.
> > 
> > You know this code very well so if I have misinterpreted something here, 
> > then please fill me in on the details.
> 
> As I said, this patch is just to avoid a forever-bypass condition, and this is an extreme condition which is rare to happen for normal workload.
> 
> Thanks.
> 
> Coly Li

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux