On Fri, 12 May 2023, Coly Li wrote: > On Thu, May 11, 2023 at 04:10:51PM -0700, Eric Wheeler wrote: > > On Tue, 9 May 2023, Adriano Silva wrote: > > > I got the parameters with this script, although I also checked / sys, doing the math everything is right. > > > > > > https://gist.github.com/damoxc/6267899 > > > > Thanks. prio_stats gives me what I'm looking for. More below. > > > > > Em segunda-feira, 8 de maio de 2023 às 21:42:26 BRT, Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> escreveu: > > > On Thu, 4 May 2023, Coly Li wrote: > > > > > 2023年5月3日 04:34,Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> 写道: > > > > > > > > > > On Thu, 20 Apr 2023, Adriano Silva wrote: > > > > >> I continue to investigate the situation. There is actually a performance > > > > >> gain when the bcache device is only half filled versus full. There is a > > > > >> reduction and greater stability in the latency of direct writes and this > > > > >> improves my scenario. > > > > > > > > > > Hi Coly, have you been able to look at this? > > > > > > > > > > This sounds like a great optimization and Adriano is in a place to test > > > > > this now and report his findings. > > > > > > > > > > I think you said this should be a simple hack to add early reclaim, so > > > > > maybe you can throw a quick patch together (even a rough first-pass with > > > > > hard-coded reclaim values) > > > > > > > > > > If we can get back to Adriano quickly then he can test while he has an > > > > > easy-to-reproduce environment. Indeed, this could benefit all bcache > > > > > users. > > > > > > > > My current to-do list on hand is a little bit long. Yes I’d like and > > > > plan to do it, but the response time cannot be estimated. > > > > > > I understand. Maybe I can put something together if you can provide some > > > pointers since you are _the_ expert on bcache these days. Here are a few > > > questions: > > > > > > Q's for Coly: > > > > > > It _looks_ like bcache frees buckets while the `ca->free_inc` list is > > full, but it could go further. Consider this hypothetical: > > > > Hi Eric, > > Bcache starts to invalidate bucket when ca->free_inc is full, and selects some > buckets to be invalidate by the replacement policy. Then continues to call > bch_invalidate_one_bucket() and pushes the invalidated bucket into ca->free_inc > until this list is full or no more candidate bucket to invalidate. Understood. The goal: In an attempt to work around Adriano's performance issue, we wish to invalidate buckets even after free_inc is full. If we can keep ~20% of buckets unused (ie, !GC_SECTORS_USED(b) ) then I think it will fix his issue. That is the theory we wish to test and validate. > > https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/md/bcache/alloc.c#L179 > > > > static void invalidate_buckets_lru(struct cache *ca) > > { > > ... > > the mutex_lock()/unlock may introduce deadlock. Before invadliate_buckets() is > called, after allocator_wait() returns the mutex lock bucket_lock is held again. I see what you mean. Maybe the bucket lock is already held; if so then I don't need to grab it again. For now I've pulled the mutex_lock lines for discussion. We only use for_each_bucket() to get a "fuzzy" count of `available` buckets (pseudo code updated below). It doesn't need to be exact. Here is some cleaned up and concise pseudo code for discussion (I've not yet compile tested): + int available = 0; + + //mutex_lock(&ca->set->bucket_lock); + for_each_bucket(b, ca) { + if (GC_MARK(b) == GC_MARK_RECLAIMABLE) + available++; + } + //mutex_unlock(&ca->set->bucket_lock); + - while (!fifo_full(&ca->free_inc)) { + while (!fifo_full(&ca->free_inc) || available < TARGET_AVAIL_BUCKETS) { ... bch_invalidate_one_bucket(ca, b); <<<< this does the work + available++; } Changes from previous post: - `available` was not incremented, now it is , so now the loop can terminate. - Removed the other counters for clarity, we only care about GC_MARK_RECLAIMABLE for this discussion. - Ignore locking for now (TARGET_AVAIL_BUCKETS is a placeholder, ultimately it would be a sysfs setting, probably a percentage.) > If ca->free_inc is full, and you still try to invalidate more candidate > buckets, the following selected bucket (by the heap_pop) will be > invalidate in bch_invalidate_one_bucket() and pushed into ca->free_inc. > But now ca->free_inc is full, so next time when invalidate_buckets_lru() > is called again, this already invalidated bucket will be accessed and > checked again in for_each_bucket(). This is just a waste of CPU cycles. True. I was aware of this performance issue when I wrote that; bcache takes ~1s to iterate for_each_bucket() on my system. Right now we just want to keep ~20% of buckets completely unused and verify correctness...and then I can work on hiding the bucket counting overhead caused by for_each_bucket(). > Further more, __bch_invalidate_one_bucket() will include the bucket's gen number and > its pin counter. Doing this without pushing the bucket into ca->free_inc, makes me > feel uncomfortable. Questions for my understanding: - Is free_inc just a reserve list such that most buckets are in the heap after a fresh `make-bcache -C <cdev>`? - What is the difference between buckets in free_inc and buckets in the heap? Do they overlap? I assume you mean this: void __bch_invalidate_one_bucket(...) { ... bch_inc_gen(ca, b); b->prio = INITIAL_PRIO; atomic_inc(&b->pin); If I understand the internals of bcache, the `gen` is just a counter that increments to make the bucket "newer" than another referenced version. Incrementing the `gen` on an unused bucket should be safe, but please correct me if I'm wrong here. I'm not familiar with b->pin, it doesn't appear to be commented in `struct bucket` and I didn't see it used in bch_allocator_push(). What is b->pin used for? > > Coly, would this work? > > It should work on some kind of workloads, but will introduce complains for other kind of workloads. As this is written above, I agree. Right now I'm just trying to understand the code well enough to free buckets preemptively so allocation doesn't happen during IO. For now please ignore the cost of for_each_bucket(). > > Can you think of any serious issues with this (besides the fact that > > for_each_bucket is slow)? > > > > I don't feel this change may help to make bcache invalidate the clean > buckets without extra cost. For the example pseudo code above that is true, and for now I am _not_ trying to address performance. > It is not simple for me to tell a solution without careful thought, this > is a tradeoff of gain and pay... Certainly that is the end goal, but first I need to understand the code well enough to invalidate buckets down to 20% free and still maintain correctness. Thanks for your help understaing this. -Eric > > Thanks. > > [snipped] > > -- > Coly Li >