> 2023年1月12日 00:01,Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> 写道: > > Hi Coly, > thank you for taking the time to reply, I really hope you are feeling > better now. Thanks. But the recovery is really slow, and my response cannot be in time. I was told it might be better after 1 month, hope it is true. > > On Fri, Dec 30, 2022 at 11:41 AM Coly Li <colyli@xxxxxxx> wrote: >> >> On 9/8/22 4:32 PM, Andrea Tomassetti wrote: >>> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001 >>> From: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> >>> Date: Thu, 8 Sep 2022 09:47:55 +0200 >>> Subject: [PATCH] bcache: Add support for live resize of backing devices >>> >>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> >> >> Hi Andrea, >> >> I am just recovered from Omicron, and able to reply email. Let me place >> my comments inline. >> >> >>> --- >>> Hi Coly, >>> Here is the first version of the patch. There are some points I noted >>> down >>> that I would like to discuss with you: >>> - I found it pretty convenient to hook the call of the new added >>> function >>> inside the `register_bcache`. In fact, every time (at least from my >>> understandings) a disk changes size, it will trigger a new probe and, >>> thus, `register_bcache` will be triggered. The only inconvenient >>> is that, in case of success, the function will output >> >> The resize should be triggered manually, and not to do it automatically. >> >> You may create a sysfs file under the cached device's directory, name it >> as "extend_size" or something else you think better. >> >> Then the sysadmin may extend the cached device size explicitly on a >> predictable time. >> >>> `error: capacity changed` even if it's not really an error. >>> - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize >>> `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a >>> problem, right? >>> - There is some reused code between this new function and >>> `bcache_device_init`. Maybe I can move `const size_t max_stripes` to >>> a broader scope or make it a macro, what do you think? >>> >>> Thank you very much, >>> Andrea >>> >>> drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 74 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c >>> index ba3909bb6bea..9a77caf2a18f 100644 >>> --- a/drivers/md/bcache/super.c >>> +++ b/drivers/md/bcache/super.c >>> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev) >>> return bch_is_open_cache(dev) || bch_is_open_backing(dev); >>> } >>> >>> +static bool bch_update_capacity(dev_t dev) >>> +{ >>> + const size_t max_stripes = min_t(size_t, INT_MAX, >>> + SIZE_MAX / sizeof(atomic_t)); >>> + >>> + uint64_t n, n_old; >>> + int nr_stripes_old; >>> + bool res = false; >>> + >>> + struct bcache_device *d; >>> + struct cache_set *c, *tc; >>> + struct cached_dev *dcp, *t, *dc = NULL; >>> + >>> + uint64_t parent_nr_sectors; >>> + >>> + list_for_each_entry_safe(c, tc, &bch_cache_sets, list) >>> + list_for_each_entry_safe(dcp, t, &c->cached_devs, list) >>> + if (dcp->bdev->bd_dev == dev) { >>> + dc = dcp; >>> + goto dc_found; >>> + } >>> + >>> +dc_found: >>> + if (!dc) >>> + return false; >>> + >>> + parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset; >>> + >>> + if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0)) >>> + return false; >>> + >> >> The above code only handles whole disk using as cached device. If a >> partition of a hard drive is used as cache device, and there are other >> data after this partition, such condition should be handled as well. So >> far I am fine to only extend size when using the whole hard drive as >> cached device, but more code is necessary to check and only permits >> size-extend for such condition. > I don't understand if there's a misalignment here so let me be more > clear: this patch is intended to support the live resize of *backing > devices*, is this what you mean with "cached device"? Yes, backing device is cached device. > When I was working on the patch I didn't consider the possibility of > live resizing the cache devices, but it should be trivial. > So, as far as I understand, a partition cannot be used as a backing > device, correct? The whole disk should be used as a backing device, > that's why I'm checking and that's why this check should be correct. > Am I wrong? Yes a partition can be used as a backing (cached) device. That is to say, if a file system is format on top of the cached device, this file system cannot be directly accessed via the partition. It has to be accessed via the bcache device e.g. /dev/bcache0. > >> >>> + if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors)) >>> + return false; >> >> The above code should be done when all rested are set. >> >> >>> + >>> + d = &dc->disk; >>> + >>> + /* Force cached device sectors re-calc */ >>> + calc_cached_dev_sectors(d->c); >> >> Here c->cached_dev_sectors might be changed, if any of the following >> steps fails, it should be restored to previous value. >> >> >>> + >>> + /* Block writeback thread */ >>> + down_write(&dc->writeback_lock); >>> + nr_stripes_old = d->nr_stripes; >>> + n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size); >>> + if (!n || n > max_stripes) { >>> + pr_err("nr_stripes too large or invalid: %llu (start sector >>> beyond end of disk?)\n", >>> + n); >>> + goto unblock_and_exit; >>> + } >>> + d->nr_stripes = n; >>> + >>> + n = d->nr_stripes * sizeof(atomic_t); >>> + n_old = nr_stripes_old * sizeof(atomic_t); >>> + d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old, >>> + n, GFP_KERNEL); >>> + if (!d->stripe_sectors_dirty) >>> + goto unblock_and_exit; >>> + >>> + n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long); >>> + n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long); >>> + d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, >>> n, GFP_KERNEL); >>> + if (!d->full_dirty_stripes) >>> + goto unblock_and_exit; >> >> If kvrealloc() fails and NULL set to d->full_dirty_sripes, the previous >> array should be restored. >> >>> + >>> + res = true; >>> + >>> +unblock_and_exit: >>> + up_write(&dc->writeback_lock); >>> + return res; >>> +} >>> + >> >> I didn't test the patch, from the first glance, I feel the failure >> handling should restore all previous values, otherwise the cache device >> may be in a non-consistent state when extend size fails. >> > Completely agree with you on this point. I changed it locally and, as > soon as we agree on the other points I'll send a newer version of the > patch. >> >>> struct async_reg_args { >>> struct delayed_work reg_work; >>> char *path; >>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject >>> *k, struct kobj_attribute *attr, >>> mutex_lock(&bch_register_lock); >>> if (lookup_bdev(strim(path), &dev) == 0 && >>> bch_is_open(dev)) >>> - err = "device already registered"; >>> + if (bch_update_capacity(dev)) >>> + err = "capacity changed"; >>> + else >>> + err = "device already registered"; >> >> >> As I said, it should be a separated write-only sysfile under the cache >> device's directory. > Can I ask why you don't like the automatic resize way? Why should the > resize be manual? Most of system administrators don’t like such silently automatic things. They want to extend the size explicitly, especially when there is other dependences in their configurations. > Someone needs to trigger the block device resize, so shouldn't that be enough? Yes, an explicit write operation on a sysfs file to trigger the resize. Then we can permit people to do the size extend explicit and avoid to change current behavior. Thanks. Coly Li > > Andrea >> >> >>> else >>> err = "device busy"; >>> mutex_unlock(&bch_register_lock); >>> -- >>> 2.37.3 >>> >>