Re: [PATCH] bcache: fixup init dirty data errors

Coly Li <colyli@xxxxxxx> · Fri, 25 Aug 2023 00:36:13 +0800

> 2023年8月24日 20:49，邹明哲 <mingzhe.zou@xxxxxxxxxxxx> 写道：
> 
> From: Coly Li <colyli@xxxxxxx>
> Date: 2023-08-23 01:49:32
> To:  Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx>
> Cc:  bcache@xxxxxxxxxxxxxxxxxx,linux-bcache@xxxxxxxxxxxxxxx,zoumingzhe@xxxxxx
> Subject: Re: [PATCH] bcache: fixup init dirty data errors>Hi Mingzhe,
>> 
>> On Tue, Aug 22, 2023 at 06:19:58PM +0800, Mingzhe Zou wrote:
>>> We found that after long run, the dirty_data of the bcache device
>>> will have errors. This error cannot be eliminated unless re-register.
>> 
>> Could you explain what the error was?
>> 
> Hi, Coly
> 
> We discovered dirty_data was inaccurate a long time ago. 
> When writeback thread flushes all dirty data, dirty_data via sysfs shows that
> there are still several K to tens of M of dirty data. 
> 
> At that time, I thought it might be a calculation error at runtime, but after
> reviewing the relevant code, no error was found.
> 
> Last month, our online environment found that a certain device had more than
> 200G of dirty_data. This brings us back to the question.
> 
>>> 
>>> We also found that reattach after detach, this error can accumulate.
>>> 
>> 
>> Could you elaborate how the error can accumulate?
>> 
> I found that when dirty_data, error, detach and then re-attach can not
> eliminate the error, the error will continue.
> 
> In bch_cached_dev_attach(), after bch_sectors_dirty_init(), attach may still fail,
> but dirty_data is not cleared when it fails
> ```
> bch_sectors_dirty_init(&dc->disk);
> 
> ret = bch_cached_dev_run(dc);
> if (ret && (ret != -EBUSY)) {
> up_write(&dc->writeback_lock);
> /*
>  * bch_register_lock is held, bcache_device_stop() is not
>  * able to be directly called. The kthread and kworker
>  * created previously in bch_cached_dev_writeback_start()
>  * have to be stopped manually here.
>  */
> kthread_stop(dc->writeback_thread);
> dc->writeback_thread = NULL;
> cancel_writeback_rate_update_dwork(dc);
> pr_err("Couldn't run cached device %s",
>        dc->backing_dev_name);
> return ret;
> }
> ```
> 
>> 
>>> In bch_sectors_dirty_init(), all inode <= d->id keys will be recounted
>>> again. This is wrong, we only need to count the keys of the current
>>> device.
>>> 
>>> Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
>>> Signed-off-by: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx>
>>> ---
>>> drivers/md/bcache/writeback.c | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>>> index 24c049067f61..71d0dabcbf9d 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -983,6 +983,8 @@ void bch_sectors_dirty_init(struct bcache_device *d)
>>> struct cache_set *c = d->c;
>>> struct bch_dirty_init_state state;
>>> 
>>> + atomic_long_set(&d->dirty_sectors, 0);
>>> +
>> 
>> The above change is not upstreamed yet, if you are fixing upstream code please
>> avoid to use d->dirty_sectors here.
>> 
> 
> Yes, dirty_sectors is a set of resize patches submitted to the community before,
> these patches have not been merged into upstream, I will remove this line in v2.
> 
> In fact, the change about dirty_sectors is only a prerequisite for resize, and it can
> be promoted first. It will greatly reduce the memory requirements of high-capacity
> devices.
> 
>> 
>> 
>>> /* Just count root keys if no leaf node */
>>> rw_lock(0, c->root, c->root->level);
>>> if (c->root->level == 0) {
>>> @@ -991,8 +993,11 @@ void bch_sectors_dirty_init(struct bcache_device *d)
>>> op.count = 0;
>>> 
>>> for_each_key_filter(&c->root->keys,
>>> -     k, &iter, bch_ptr_invalid)
>>> +     k, &iter, bch_ptr_invalid) {
>>> + if (KEY_INODE(k) != op.inode)
>>> + continue;
>>> sectors_dirty_init_fn(&op.op, c->root, k);
>>> + }
>>> 
>> 
>> Nice catch! IMHO if I take the above change, setting d->dirty_sectors by 0
>> might be unncessary in ideal condition, am I right?
>> 
> 
> In bch_cached_dev_attach () may still fail and exit, I think it is necessary to clear 0.

Copied. Thanks for the information, I will take the v2 patch.

Coly Li