Hi Coly, thank you for taking the time to reply, I really hope you are feeling better now. On Fri, Dec 30, 2022 at 11:41 AM Coly Li <colyli@xxxxxxx> wrote: > > On 9/8/22 4:32 PM, Andrea Tomassetti wrote: > > From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001 > > From: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> > > Date: Thu, 8 Sep 2022 09:47:55 +0200 > > Subject: [PATCH] bcache: Add support for live resize of backing devices > > > > Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> > > Hi Andrea, > > I am just recovered from Omicron, and able to reply email. Let me place > my comments inline. > > > > --- > > Hi Coly, > > Here is the first version of the patch. There are some points I noted > > down > > that I would like to discuss with you: > > - I found it pretty convenient to hook the call of the new added > > function > > inside the `register_bcache`. In fact, every time (at least from my > > understandings) a disk changes size, it will trigger a new probe and, > > thus, `register_bcache` will be triggered. The only inconvenient > > is that, in case of success, the function will output > > The resize should be triggered manually, and not to do it automatically. > > You may create a sysfs file under the cached device's directory, name it > as "extend_size" or something else you think better. > > Then the sysadmin may extend the cached device size explicitly on a > predictable time. > > > `error: capacity changed` even if it's not really an error. > > - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize > > `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a > > problem, right? > > - There is some reused code between this new function and > > `bcache_device_init`. Maybe I can move `const size_t max_stripes` to > > a broader scope or make it a macro, what do you think? > > > > Thank you very much, > > Andrea > > > > drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 74 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > > index ba3909bb6bea..9a77caf2a18f 100644 > > --- a/drivers/md/bcache/super.c > > +++ b/drivers/md/bcache/super.c > > @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev) > > return bch_is_open_cache(dev) || bch_is_open_backing(dev); > > } > > > > +static bool bch_update_capacity(dev_t dev) > > +{ > > + const size_t max_stripes = min_t(size_t, INT_MAX, > > + SIZE_MAX / sizeof(atomic_t)); > > + > > + uint64_t n, n_old; > > + int nr_stripes_old; > > + bool res = false; > > + > > + struct bcache_device *d; > > + struct cache_set *c, *tc; > > + struct cached_dev *dcp, *t, *dc = NULL; > > + > > + uint64_t parent_nr_sectors; > > + > > + list_for_each_entry_safe(c, tc, &bch_cache_sets, list) > > + list_for_each_entry_safe(dcp, t, &c->cached_devs, list) > > + if (dcp->bdev->bd_dev == dev) { > > + dc = dcp; > > + goto dc_found; > > + } > > + > > +dc_found: > > + if (!dc) > > + return false; > > + > > + parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset; > > + > > + if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0)) > > + return false; > > + > > The above code only handles whole disk using as cached device. If a > partition of a hard drive is used as cache device, and there are other > data after this partition, such condition should be handled as well. So > far I am fine to only extend size when using the whole hard drive as > cached device, but more code is necessary to check and only permits > size-extend for such condition. I don't understand if there's a misalignment here so let me be more clear: this patch is intended to support the live resize of *backing devices*, is this what you mean with "cached device"? When I was working on the patch I didn't consider the possibility of live resizing the cache devices, but it should be trivial. So, as far as I understand, a partition cannot be used as a backing device, correct? The whole disk should be used as a backing device, that's why I'm checking and that's why this check should be correct. Am I wrong? > > > + if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors)) > > + return false; > > The above code should be done when all rested are set. > > > > + > > + d = &dc->disk; > > + > > + /* Force cached device sectors re-calc */ > > + calc_cached_dev_sectors(d->c); > > Here c->cached_dev_sectors might be changed, if any of the following > steps fails, it should be restored to previous value. > > > > + > > + /* Block writeback thread */ > > + down_write(&dc->writeback_lock); > > + nr_stripes_old = d->nr_stripes; > > + n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size); > > + if (!n || n > max_stripes) { > > + pr_err("nr_stripes too large or invalid: %llu (start sector > > beyond end of disk?)\n", > > + n); > > + goto unblock_and_exit; > > + } > > + d->nr_stripes = n; > > + > > + n = d->nr_stripes * sizeof(atomic_t); > > + n_old = nr_stripes_old * sizeof(atomic_t); > > + d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old, > > + n, GFP_KERNEL); > > + if (!d->stripe_sectors_dirty) > > + goto unblock_and_exit; > > + > > + n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long); > > + n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long); > > + d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, > > n, GFP_KERNEL); > > + if (!d->full_dirty_stripes) > > + goto unblock_and_exit; > > If kvrealloc() fails and NULL set to d->full_dirty_sripes, the previous > array should be restored. > > > + > > + res = true; > > + > > +unblock_and_exit: > > + up_write(&dc->writeback_lock); > > + return res; > > +} > > + > > I didn't test the patch, from the first glance, I feel the failure > handling should restore all previous values, otherwise the cache device > may be in a non-consistent state when extend size fails. > Completely agree with you on this point. I changed it locally and, as soon as we agree on the other points I'll send a newer version of the patch. > > > struct async_reg_args { > > struct delayed_work reg_work; > > char *path; > > @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject > > *k, struct kobj_attribute *attr, > > mutex_lock(&bch_register_lock); > > if (lookup_bdev(strim(path), &dev) == 0 && > > bch_is_open(dev)) > > - err = "device already registered"; > > + if (bch_update_capacity(dev)) > > + err = "capacity changed"; > > + else > > + err = "device already registered"; > > > As I said, it should be a separated write-only sysfile under the cache > device's directory. Can I ask why you don't like the automatic resize way? Why should the resize be manual? Someone needs to trigger the block device resize, so shouldn't that be enough? Andrea > > > > else > > err = "device busy"; > > mutex_unlock(&bch_register_lock); > > -- > > 2.37.3 > > >