Re: [PATCH 12/19] bcache: update bucket_in_use periodically

Coly Li <i@xxxxxxx> · Thu, 13 Jul 2017 12:27:15 +0800

On 2017/7/13 下午12:13, Eric Wheeler wrote:
> On Tue, 11 Jul 2017, Coly Li wrote:
> 
>> On 2017/7/11 下午1:39, tang.junhui@xxxxxxxxxx wrote:
>>> Compared to bucket depletion, resulting in hanging dead,
>>> It is worthy to consumes a little time to update the bucket_in_use.
>>> If you have any better solution, please show to us,
>>> We should solve it as soon as possible, not wait for it forever.
>>>
>>
>> I also test this patch on a cache device with 4x3.8TB size, all buckets
>> iteration takes around 40-50ms. If the iteration needs to hold
>> bucket_lock of cache set, it is very probably to introduce a huge I/O
>> latency in period of every 30 seconds.
>>
>> For database people, this is not good news.
> 
> 
> Hi Tang,
>    
> I'm waiting to queue this patch pending your response to Coly.  
> 
> Please send a v2 when you're ready.

Eric,

I guess Tang is working on the I/O hang issue during back ground garbage
collection running. From discussion from other email thread, it seems a
regular I/O request gets hung for 10+ second in some cases. Maybe that
issue is more urgent than this one.

>From my personal opinion, updating bucket_in_use is for acting garbage
collection. If number of bucket in use is not updated in time, garbage
collection won't start due to old bucket_in_use still beyond
CUTOFF_WRITEBACK_SYNC.

We may maintain an atomic counter per-cache set for dirty buckets, and
update it at some locations when allocating or reclaiming bucket. This
counter is unnecessary to be very accurate, just accurate enough for
should_writeback() working correctly.

I am also looking at it for a better solution as well.

Coly

>>
>> Coly
>>
>>
>>>
>>>
>>>
>>> 发件人:         Coly Li <i@xxxxxxx>
>>> 收件人:         linux-block@xxxxxxxxxxxxxxx, Tang Junhui
>>> <tang.junhui@xxxxxxxxxx>,
>>> 抄送:        bcache@xxxxxxxxxxxxxxxxxx, linux-bcache@xxxxxxxxxxxxxxx,
>>> hch@xxxxxxxxxxxxx, axboe@xxxxxxxxx
>>> 日期:         2017/07/11 13:06
>>> 主题:        Re: [PATCH 12/19] bcache: update bucket_in_use periodically
>>> 发件人:        linux-bcache-owner@xxxxxxxxxxxxxxx
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>> On 2017/7/1 上午4:43, bcache@xxxxxxxxxxxxxxxxxx wrote:
>>>> From: Tang Junhui <tang.junhui@xxxxxxxxxx>
>>>>
>>>> bucket_in_use is updated in gc thread which triggered by invalidating or
>>>> writing sectors_to_gc dirty data, It's been too long, Therefore, when we
>>>> use it to compare with the threshold, it is often not timely, which leads
>>>> to inaccurate judgment and often results in bucket depletion.
>>>>
>>>> Signed-off-by: Tang Junhui <tang.junhui@xxxxxxxxxx>
>>>> ---
>>>>  drivers/md/bcache/btree.c | 29 +++++++++++++++++++++++++++--
>>>>  1 file changed, 27 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
>>>> index 866dcf7..77aa20b 100644
>>>> --- a/drivers/md/bcache/btree.c
>>>> +++ b/drivers/md/bcache/btree.c
>>>> @@ -90,6 +90,8 @@
>>>>  #define MAX_NEED_GC                                  64
>>>>  #define MAX_SAVE_PRIO                                  72
>>>>  
>>>> +#define GC_THREAD_TIMEOUT_MS                 (30 * 1000)
>>>> +
>>>>  #define PTR_DIRTY_BIT                                  (((uint64_t) 1
>>> << 36))
>>>>  
>>>>  #define PTR_HASH(c, k)                                              
>>>                                                                         \
>>>> @@ -1760,6 +1762,23 @@ static void bch_btree_gc(struct cache_set *c)
>>>>                   bch_moving_gc(c);
>>>>  }
>>>>  
>>>> +void bch_update_bucket_in_use(struct cache_set *c)
>>>> +{
>>>> +                 struct cache *ca;
>>>> +                 struct bucket *b;
>>>> +                 unsigned i;
>>>> +                 size_t available = 0;
>>>> +
>>>> +                 for_each_cache(ca, c, i) {
>>>> +                                  for_each_bucket(b, ca) {
>>>> +                                                   if (!GC_MARK(b) ||
>>> GC_MARK(b) == GC_MARK_RECLAIMABLE)
>>>> +                                                                  
>>>  available++;
>>>> +                                  }
>>>> +                 }
>>>> +
>>>
>>> bucket_lock of cache set should be held before accessing buckets.
>>>
>>>
>>>> +                 c->gc_stats.in_use = (c->nbuckets - available) * 100
>>> / c->nbuckets;
>>>> +}
>>>> +
>>>>  static bool gc_should_run(struct cache_set *c)
>>>>  {
>>>>                   struct cache *ca;
>>>> @@ -1778,10 +1797,16 @@ static bool gc_should_run(struct cache_set *c)
>>>>  static int bch_gc_thread(void *arg)
>>>>  {
>>>>                   struct cache_set *c = arg;
>>>> +                 long  ret;
>>>> +                 unsigned long timeout =
>>> msecs_to_jiffies(GC_THREAD_TIMEOUT_MS);
>>>>  
>>>>                   while (1) {
>>>> -                                  wait_event_interruptible(c->gc_wait,
>>>> -                                                    
>>>  kthread_should_stop() || gc_should_run(c));
>>>> +                                  ret =
>>> wait_event_interruptible_timeout(c->gc_wait,
>>>> +                                                    
>>>  kthread_should_stop() || gc_should_run(c), timeout);
>>>> +                                  if (!ret) {
>>>> +                                                  
>>> bch_update_bucket_in_use(c);
>>>> +                                                   continue;
>>>
>>> A continue here will ignore status returned from kthread_should_stop(),
>>> which might not be expected behavior.
>>>
>>>
>>>> +                                  }
>>>>  
>>>>                                    if (kthread_should_stop())
>>>>                                                     break;
>>>>
>>>
>>> Iterating all buckets from the cache set requires bucket_lock to be
>>> held. Waiting for bucket_lock may take quite a long time for either
>>> bucket allocating code or bch_gc_thread(). What I concern is, this patch
>>> may introduce bucket allocation delay in period of GC_THREAD_TIMEOUT_MS.
>>>
>>> We need to find out a way to avoid such a performance regression.
>>>
>>> -- 
>>> Coly Li
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>>
>> -- 
>> Coly Li

-- 
Coly Li