件人:Eddie Chapman <eddie@xxxxxxxx> 发送日期:2023-09-07 19:14:10 收件人:Coly Li <colyli@xxxxxxx>,Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> 抄送人:Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx>,linux-bcache@xxxxxxxxxxxxxxx,zoumingzhe@xxxxxx 主题:Re: [PATCH] bcache: fixup init dirty data errors>Coly Li wrote: >> >>> 2023年8月24日 20:49,邹明哲 <mingzhe.zou@xxxxxxxxxxxx> 写道: >>> >>> From: Coly Li <colyli@xxxxxxx> >>> Date: 2023-08-23 01:49:32 >>> To: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> >>> Cc: >>> bcache@xxxxxxxxxxxxxxxxxx,linux-bcache@xxxxxxxxxxxxxxx,zoumingzhe@xxxxx >>> m Subject: Re: [PATCH] bcache: fixup init dirty data errors>Hi Mingzhe, >>> >>>> On Tue, Aug 22, 2023 at 06:19:58PM +0800, Mingzhe Zou wrote: >>>> >>>>> We found that after long run, the dirty_data of the bcache device >>>>> will have errors. This error cannot be eliminated unless >>>>> re-register. >>>> >>>> Could you explain what the error was? >>>> >>> Hi, Coly >>> >>> We discovered dirty_data was inaccurate a long time ago. >>> When writeback thread flushes all dirty data, dirty_data via sysfs shows >>> that there are still several K to tens of M of dirty data. >>> >>> At that time, I thought it might be a calculation error at runtime, but >>> after reviewing the relevant code, no error was found. >>> >>> Last month, our online environment found that a certain device had more >>> than 200G of dirty_data. This brings us back to the question. >>> >>>>> We also found that reattach after detach, this error can >>>>> accumulate. >>>> >>>> Could you elaborate how the error can accumulate? >>>> >>> I found that when dirty_data, error, detach and then re-attach can not >>> eliminate the error, the error will continue. >>> >>> In bch_cached_dev_attach(), after bch_sectors_dirty_init(), attach may >>> still fail, but dirty_data is not cleared when it fails ``` >>> bch_sectors_dirty_init(&dc->disk); >>> >>> ret = bch_cached_dev_run(dc); if (ret && (ret != -EBUSY)) { >>> up_write(&dc->writeback_lock); /* >>> * bch_register_lock is held, bcache_device_stop() is not >>> * able to be directly called. The kthread and kworker >>> * created previously in bch_cached_dev_writeback_start() >>> * have to be stopped manually here. >>> */ >>> kthread_stop(dc->writeback_thread); dc->writeback_thread = NULL; >>> cancel_writeback_rate_update_dwork(dc); pr_err("Couldn't run cached >>> device %s", dc->backing_dev_name); return ret; } >>> ``` >>>> >>>>> In bch_sectors_dirty_init(), all inode <= d->id keys will be >>>>> recounted again. This is wrong, we only need to count the keys of >>>>> the current device. >>>>> >>>>> Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be >>>>> multithreaded") Signed-off-by: Mingzhe Zou >>>>> <mingzhe.zou@xxxxxxxxxxxx> Hi, Eddie The patch fix up commit b144e45fc57649e15cbc79ff2d32a942af1d91d5. The following are all tags containing this commit-id: $ git tag --contains b144e45fc576 v5.10 v5.10-rc1 v5.10-rc2 v5.10-rc3 v5.10-rc4 v5.10-rc5 v5.10-rc6 v5.10-rc7 v5.11 v5.11-rc1 v5.11-rc2 v5.11-rc3 v5.11-rc4 v5.11-rc5 v5.11-rc6 v5.11-rc7 v5.12 v5.12-rc1-dontuse v5.12-rc2 v5.12-rc3 v5.12-rc4 v5.12-rc5 v5.12-rc6 v5.12-rc7 v5.12-rc8 v5.13 v5.13-rc1 v5.13-rc2 v5.13-rc3 v5.13-rc4 v5.13-rc5 v5.13-rc6 v5.13-rc7 v5.14 v5.14-rc1 v5.14-rc2 v5.14-rc3 v5.14-rc4 v5.14-rc5 v5.14-rc6 v5.14-rc7 v5.15 v5.15-rc1 v5.15-rc2 v5.15-rc3 v5.15-rc4 v5.15-rc5 v5.15-rc6 v5.15-rc7 v5.16 v5.16-rc1 v5.16-rc2 v5.16-rc3 v5.16-rc4 v5.16-rc5 v5.16-rc6 v5.16-rc7 v5.16-rc8 v5.17 v5.17-rc1 v5.17-rc2 v5.17-rc3 v5.17-rc4 v5.17-rc5 v5.17-rc6 v5.17-rc7 v5.17-rc8 v5.18 v5.18-rc1 v5.18-rc2 v5.18-rc3 v5.18-rc4 v5.18-rc5 v5.18-rc6 v5.18-rc7 v5.19 v5.19-rc1 v5.19-rc2 v5.19-rc3 v5.19-rc4 v5.19-rc5 v5.19-rc6 v5.19-rc7 v5.19-rc8 v5.7 v5.7-rc1 v5.7-rc2 v5.7-rc3 v5.7-rc4 v5.7-rc5 v5.7-rc6 v5.7-rc7 v5.8 v5.8-rc1 v5.8-rc2 v5.8-rc3 v5.8-rc4 v5.8-rc5 v5.8-rc6 v5.8-rc7 v5.9 v5.9-rc1 v5.9-rc2 v5.9-rc3 v5.9-rc4 v5.9-rc5 v5.9-rc6 v5.9-rc7 v5.9-rc8 v6.0 v6.0-rc1 v6.0-rc2 v6.0-rc3 v6.0-rc4 v6.0-rc5 v6.0-rc6 v6.0-rc7 v6.1 v6.1-rc1 v6.1-rc2 v6.1-rc3 v6.1-rc4 v6.1-rc5 v6.1-rc6 v6.1-rc7 v6.1-rc8 v6.2 v6.2-rc1 v6.2-rc2 v6.2-rc3 v6.2-rc4 v6.2-rc5 v6.2-rc6 v6.2-rc7 v6.2-rc8 v6.3 v6.3-rc1 v6.3-rc2 v6.3-rc3 v6.3-rc4 v6.3-rc5 v6.3-rc6 v6.3-rc7 v6.4 v6.4-rc1 v6.4-rc2 v6.4-rc3 v6.4-rc4 v6.4-rc5 v6.4-rc6 v6.4-rc7 v6.5 v6.5-rc1 v6.5-rc2 v6.5-rc3 v6.5-rc4 v6.5-rc5 v6.5-rc6 v6.5-rc7 >>>>> --- >>>>> drivers/md/bcache/writeback.c | 7 ++++++- 1 file changed, 6 >>>>> insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/md/bcache/writeback.c >>>>> b/drivers/md/bcache/writeback.c index 24c049067f61..71d0dabcbf9d >>>>> 100644 >>>>> --- a/drivers/md/bcache/writeback.c >>>>> +++ b/drivers/md/bcache/writeback.c >>>>> @@ -983,6 +983,8 @@ void bch_sectors_dirty_init(struct bcache_device >>>>> *d) >>>>> struct cache_set *c = d->c; struct bch_dirty_init_state state; >>>>> >>>>> + atomic_long_set(&d->dirty_sectors, 0); >>>>> + >>>> >>>> The above change is not upstreamed yet, if you are fixing upstream >>>> code please avoid to use d->dirty_sectors here. >>> >>> Yes, dirty_sectors is a set of resize patches submitted to the >>> community before, these patches have not been merged into upstream, I >>> will remove this line in v2. >>> >>> In fact, the change about dirty_sectors is only a prerequisite for >>> resize, and it can be promoted first. It will greatly reduce the memory >>> requirements of high-capacity devices. >>> >>>>> /* Just count root keys if no leaf node */ >>>>> rw_lock(0, c->root, c->root->level); if (c->root->level == 0) { @@ >>>>> -991,8 +993,11 @@ void bch_sectors_dirty_init(struct bcache_device >>>>> *d) >>>>> op.count = 0; >>>>> >>>>> for_each_key_filter(&c->root->keys, - k, &iter, bch_ptr_invalid) >>>>> + k, &iter, bch_ptr_invalid) { >>>>> + if (KEY_INODE(k) != op.inode) >>>>> + continue; >>>>> sectors_dirty_init_fn(&op.op, c->root, k); + } >>>>> >>>> Nice catch! IMHO if I take the above change, setting d->dirty_sectors >>>> by 0 might be unncessary in ideal condition, am I right? >>> >>> In bch_cached_dev_attach () may still fail and exit, I think it is >>> necessary to clear 0. >> >> Copied. Thanks for the information, I will take the v2 patch. >> >> Coly Li >> > >Hi Coly, Mingzhe, > >Can I ask, how far back would this fix be needed, in terms of stable >versions? > >Thanks, >Eddie >