Hi Eric, thank you for your two cents. On Fri, Jan 27, 2023 at 11:40 PM Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> wrote: > > On Fri, 27 Jan 2023, Andrea Tomassetti wrote: > > > From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001 > > From: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> > > Date: Thu, 8 Sep 2022 09:47:55 +0200 > > Subject: [PATCH v2] bcache: Add support for live resize of backing devices > > > > Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> > > --- > > Hi Coly, > > this is the second version of the patch. As you correctly pointed out, > > I implemented roll-back functionalities in case of error. > > I'm testing this funcionality using QEMU/KVM vm via libvirt. > > Here the steps: > > 1. make-bcache --writeback -B /dev/vdb -C /dev/vdc > > 2. mkfs.xfs /dev/bcache0 > > 3. mount /dev/bcache0 /mnt > > 3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000 > > 4. md5sum /mnt/random0 | tee /mnt/random0.md5 > > 5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size > > <new-size> > > 6. xfs_growfs /dev/bcache0 > > 6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5) > > 7. umount/reboot/remount and check that the md5 hashes are correct with > > md5sum -c /mnt/random?.md5 > > > > drivers/md/bcache/super.c | 84 ++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 83 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > > index ba3909bb6bea..1435a3f605f8 100644 > > --- a/drivers/md/bcache/super.c > > +++ b/drivers/md/bcache/super.c > > @@ -2443,6 +2443,85 @@ static bool bch_is_open(dev_t dev) > > return bch_is_open_cache(dev) || bch_is_open_backing(dev); > > } > > > > +static bool bch_update_capacity(dev_t dev) > > +{ > > + const size_t max_stripes = min_t(size_t, INT_MAX, > > + SIZE_MAX / sizeof(atomic_t)); > > + > > + uint64_t n, n_old, orig_cached_sectors = 0; > > + void *tmp_realloc; > > + > > + int nr_stripes_old; > > + bool res = false; > > + > > + struct bcache_device *d; > > + struct cache_set *c, *tc; > > + struct cached_dev *dcp, *t, *dc = NULL; > > + > > + uint64_t parent_nr_sectors; > > + > > + list_for_each_entry_safe(c, tc, &bch_cache_sets, list) > > + list_for_each_entry_safe(dcp, t, &c->cached_devs, list) > > + if (dcp->bdev->bd_dev == dev) { > > + dc = dcp; > > + goto dc_found; > > + } > > + > > +dc_found: > > + if (!dc) > > + return false; > > + > > + parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset; > > + > > + if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0)) > > + return false; > > + > > + d = &dc->disk; > > + orig_cached_sectors = d->c->cached_dev_sectors; > > + > > + /* Force cached device sectors re-calc */ > > + calc_cached_dev_sectors(d->c); > > + > > + /* Block writeback thread */ > > + down_write(&dc->writeback_lock); > > + nr_stripes_old = d->nr_stripes; > > + n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size); > > + if (!n || n > max_stripes) { > > + pr_err("nr_stripes too large or invalid: %llu (start sector > > beyond end of disk?)\n", > > + n); > > + goto restore_dev_sectors; > > + } > > + d->nr_stripes = n; > > It looks like there is some overlap between the new bch_update_capacity() > function and the existing bcache_device_init() function: This is something I already pointed out in the first version of my patch, some emails ago: - There is some reused code between this new function and `bcache_device_init`. Maybe I can move `const size_t max_stripes` to a broader scope or make it a macro, what do you think? > > https://github.com/torvalds/linux/blob/master/drivers/md/bcache/super.c#L909 > > IMHO, it would be ideal if bch_update_capacity() could also handle the > uninitialized state of full_dirty_stripes/stripe_sectors_dirty (and any > related bits) at bdev registration time so bcache_device_init() can call > bch_update_capacity(). Thus, bch_update_capacity() would replace the > set_capacity() call in bcache_device_init(). > Nice idea Regards, Andrea > If a call to bch_update_capacity() can set the size at registration or > resize as necessary then it would remove duplicate code and keep > initialization in only one place. > > I'll defer to what Coly thinks is best, just my $0.02 > > Cheers, > > > -- > Eric Wheeler > > > > > + > > + n = d->nr_stripes * sizeof(atomic_t); > > + n_old = nr_stripes_old * sizeof(atomic_t); > > + tmp_realloc = kvrealloc(d->stripe_sectors_dirty, n_old, > > + n, GFP_KERNEL); > > + if (!tmp_realloc) > > + goto restore_nr_stripes; > > + > > + d->stripe_sectors_dirty = (atomic_t *) tmp_realloc; > > + > > + n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long); > > + n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long); > > + tmp_realloc = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL); > > + if (!tmp_realloc) > > + goto restore_nr_stripes; > > + > > + d->full_dirty_stripes = (unsigned long *) tmp_realloc; > > + > > + if ((res = set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))) > > + goto unblock_and_exit; > > + > > +restore_nr_stripes: > > + d->nr_stripes = nr_stripes_old; > > +restore_dev_sectors: > > + d->c->cached_dev_sectors = orig_cached_sectors; > > +unblock_and_exit: > > + up_write(&dc->writeback_lock); > > + return res; > > +} > > + > > struct async_reg_args { > > struct delayed_work reg_work; > > char *path; > > @@ -2569,7 +2648,10 @@ static ssize_t register_bcache(struct kobject *k, > > struct kobj_attribute *attr, > > mutex_lock(&bch_register_lock); > > if (lookup_bdev(strim(path), &dev) == 0 && > > bch_is_open(dev)) > > - err = "device already registered"; > > + if (bch_update_capacity(dev)) > > + err = "capacity changed"; > > + else > > + err = "device already registered"; > > else > > err = "device busy"; > > mutex_unlock(&bch_register_lock); > > -- > > 2.39.0 > > > > > > > > On 25/1/23 18:59, Coly Li wrote: > > > > > > > > >> 2023年1月25日 18:07,Andrea Tomassetti > > >> <andrea.tomassetti-opensource@xxxxxxxx> 写道: > > >> > > >> On Tue, Jan 17, 2023 at 5:18 PM Coly Li <colyli@xxxxxxx> wrote: > > >>>> > > > > > >>>>> > > >>>>>> struct async_reg_args { > > >>>>>> struct delayed_work reg_work; > > >>>>>> char *path; > > >>>>>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject > > >>>>>> *k, struct kobj_attribute *attr, > > >>>>>> mutex_lock(&bch_register_lock); > > >>>>>> if (lookup_bdev(strim(path), &dev) == 0 && > > >>>>>> bch_is_open(dev)) > > >>>>>> - err = "device already registered"; > > >>>>>> + if (bch_update_capacity(dev)) > > >>>>>> + err = "capacity changed"; > > >>>>>> + else > > >>>>>> + err = "device already registered"; > > >>>>> > > >>>>> > > >>>>> As I said, it should be a separated write-only sysfile under the cache > > >>>>> device's directory. > > >>>> Can I ask why you don't like the automatic resize way? Why should the > > >>>> resize be manual? > > >>> > > >>> Most of system administrators don’t like such silently automatic things. > > >>> They want to extend the size explicitly, especially when there is other > > >>> dependences in their configurations. > > >>> > > >> What I was trying to say is that, in order to resize a block device, a > > >> manual command should be executed. So, this is already a "non-silent" > > >> automatic thing. > > >> Moreover, if the block device has a FS on it, the FS needs to be > > >> manually grown with some special utilities, e.g. xfs_growfs. So, > > >> again, another non-silent automatic step. Don't you agree? > > >> For example, to resize a qcow device attached to a VM I'm manually > > >> doing a `virsh blockresize`. As soon as I issue that command, the > > >> virtio_blk driver inside the VM detects the disk size change and calls > > >> the `set_capacity_and_notify` function. Why then should bcache behave > > >> differently? > > > > > > The above VM example makes sense, I am almost convinced. > > > > > >> > > >> If you're concerned that this can somehow break the > > >> behaviour-compatibility with older versions of the driver, can we > > >> protect this automatic discovery with an optional parameter? Will this > > >> be an option you will take into account? > > > > > > Then let’s forget the option sysfs at this moment. Once you feel the patch > > > is ready for me to testing, please notice me with detailed steps to redo > > > your testing. > > > At that time during my testing, let’s discuss whether an extra option is > > > necesssary, for now just keep your idea as automatically resize the cached > > > device. > > > > > > Thanks for your detailed explanation. > > > > > > Coly Li > > > > > > >