Hi Coly, just a kind reminder for this patch. Thank you very much, Andrea On Mon, Sep 19, 2022 at 2:17 PM Coly Li <colyli@xxxxxxx> wrote: > > > > > 2022年9月19日 19:42,Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> 写道: > > > > Hi Coly, > > have you had any time to take a look at this? Do you prefer if I send the patch as a separate thread? > > > > Thank you very much, > > Andrea > > > Yes, it is on my queue, just after I finish my tasks on hand, I will take a look on it. > > Thanks. > > Coly Li > > > > > > On 8/9/22 10:32, Andrea Tomassetti wrote: > >> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001 > >> From: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> > >> Date: Thu, 8 Sep 2022 09:47:55 +0200 > >> Subject: [PATCH] bcache: Add support for live resize of backing devices > >> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> > >> --- > >> Hi Coly, > >> Here is the first version of the patch. There are some points I noted down > >> that I would like to discuss with you: > >> - I found it pretty convenient to hook the call of the new added function > >> inside the `register_bcache`. In fact, every time (at least from my > >> understandings) a disk changes size, it will trigger a new probe and, > >> thus, `register_bcache` will be triggered. The only inconvenient > >> is that, in case of success, the function will output > >> `error: capacity changed` even if it's not really an error. > >> - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize > >> `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a > >> problem, right? > >> - There is some reused code between this new function and > >> `bcache_device_init`. Maybe I can move `const size_t max_stripes` to > >> a broader scope or make it a macro, what do you think? > >> Thank you very much, > >> Andrea > >> drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++- > >> 1 file changed, 74 insertions(+), 1 deletion(-) > >> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > >> index ba3909bb6bea..9a77caf2a18f 100644 > >> --- a/drivers/md/bcache/super.c > >> +++ b/drivers/md/bcache/super.c > >> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev) > >> return bch_is_open_cache(dev) || bch_is_open_backing(dev); > >> } > >> +static bool bch_update_capacity(dev_t dev) > >> +{ > >> + const size_t max_stripes = min_t(size_t, INT_MAX, > >> + SIZE_MAX / sizeof(atomic_t)); > >> + > >> + uint64_t n, n_old; > >> + int nr_stripes_old; > >> + bool res = false; > >> + > >> + struct bcache_device *d; > >> + struct cache_set *c, *tc; > >> + struct cached_dev *dcp, *t, *dc = NULL; > >> + > >> + uint64_t parent_nr_sectors; > >> + > >> + list_for_each_entry_safe(c, tc, &bch_cache_sets, list) > >> + list_for_each_entry_safe(dcp, t, &c->cached_devs, list) > >> + if (dcp->bdev->bd_dev == dev) { > >> + dc = dcp; > >> + goto dc_found; > >> + } > >> + > >> +dc_found: > >> + if (!dc) > >> + return false; > >> + > >> + parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset; > >> + > >> + if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0)) > >> + return false; > >> + > >> + if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors)) > >> + return false; > >> + > >> + d = &dc->disk; > >> + > >> + /* Force cached device sectors re-calc */ > >> + calc_cached_dev_sectors(d->c); > >> + > >> + /* Block writeback thread */ > >> + down_write(&dc->writeback_lock); > >> + nr_stripes_old = d->nr_stripes; > >> + n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size); > >> + if (!n || n > max_stripes) { > >> + pr_err("nr_stripes too large or invalid: %llu (start sector beyond end of disk?)\n", > >> + n); > >> + goto unblock_and_exit; > >> + } > >> + d->nr_stripes = n; > >> + > >> + n = d->nr_stripes * sizeof(atomic_t); > >> + n_old = nr_stripes_old * sizeof(atomic_t); > >> + d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old, > >> + n, GFP_KERNEL); > >> + if (!d->stripe_sectors_dirty) > >> + goto unblock_and_exit; > >> + > >> + n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long); > >> + n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long); > >> + d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL); > >> + if (!d->full_dirty_stripes) > >> + goto unblock_and_exit; > >> + > >> + res = true; > >> + > >> +unblock_and_exit: > >> + up_write(&dc->writeback_lock); > >> + return res; > >> +} > >> + > >> struct async_reg_args { > >> struct delayed_work reg_work; > >> char *path; > >> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, > >> mutex_lock(&bch_register_lock); > >> if (lookup_bdev(strim(path), &dev) == 0 && > >> bch_is_open(dev)) > >> - err = "device already registered"; > >> + if (bch_update_capacity(dev)) > >> + err = "capacity changed"; > >> + else > >> + err = "device already registered"; > >> else > >> err = "device busy"; > >> mutex_unlock(&bch_register_lock); > >> -- > >> 2.37.3 > >> On 6/9/22 15:22, Andrea Tomassetti wrote: > >>> Hi Coly, > >>> I have finally some time to work on the patch. I already have a first > >>> prototype that seems to work but, before sending it, I would like to > >>> ask you two questions: > >>> 1. Inspecting the code, I found that the only lock mechanism is the > >>> writeback_lock semaphore. Am I correct? > >>> 2. How can I effectively test my patch? So far I'm doing something like this: > >>> a. make-bcache --writeback -B /dev/vdb -C /dev/vdc > >>> b. mkfs.xfs /dev/bcache0 > >>> c. dd if=/dev/random of=/mnt/random bs=1M count=1000 > >>> d. md5sum /mnt/random | tee /mnt/random.md5 > >>> e. live resize the disk and repeat c. and d. > >>> f. umount/reboot/remount and check that the md5 hashes are correct > >>> > >>> Any suggestions? > >>> > >>> Thank you very much, > >>> Andrea > >>> > >>> On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> wrote: > >>>> > >>>> On Wed, 3 Aug 2022, Andrea Tomassetti wrote: > >>>>> Hi Coly, > >>>>> In one of our previous emails you said that > >>>>>> Currently bcache doesn’t support cache or backing device resize > >>>>> > >>>>> I was investigating this point and I actually found a solution. I > >>>>> briefly tested it and it seems to work fine. > >>>>> Basically what I'm doing is: > >>>>> 1. Check if there's any discrepancy between the nr of sectors > >>>>> reported by the bcache backing device (holder) and the nr of sectors > >>>>> reported by its parent (slave). > >>>>> 2. If the number of sectors of the two devices are not the same, > >>>>> then call set_capacity_and_notify on the bcache device. > >>>>> 3. From user space, depending on the fs used, grow the fs with some > >>>>> utility (e.g. xfs_growfs) > >>>>> > >>>>> This works without any need of unmounting the mounted fs nor stopping > >>>>> the bcache backing device. > >>>> > >>>> Well done! +1, would love to see a patch for this! > >>>> > >>>> > >>>>> So my question is: am I missing something? Can this live resize cause > >>>>> some problems (e.g. data loss)? Would it be useful if I send a patch on > >>>>> this? > >>>> > >>>> A while a go we looked into doing this. Here is the summary of our > >>>> findings, not sure if there are any other considerations: > >>>> > >>>> 1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger > >>>> resize on echo 1 >. > >>>> 2. Refactor the set_capacity() bits from bcache_device_init() into its > >>>> own function. > >>>> 3. Put locks around bcache_device.full_dirty_stripes and > >>>> bcache_device.stripe_sectors_dirty. Re-alloc+copy+free and zero the > >>>> new bytes at the end. Grep where bcache_device.full_dirty_stripes is > >>>> used and make sure it is locked appropriately, probably in the > >>>> writeback thread, maybe other places too. > >>>> > >>>> The cachedev's don't know anything about the bdev size, so (according to > >>>> Kent) they will "just work" by referencing new offsets that were > >>>> previously beyond the disk. (This is basically the same as resizing the > >>>> bdev and then unregister/re-register which is how we resize bdevs now.) > >>>> > >>>> As for resizing a cachedev, I've not looked at all---not sure about that > >>>> one. We always detach, resize, make-bcache and re-attach the new cache. > >>>> Maybe it is similarly simple, but haven't looked. > >>>> > >>>> > >>>> -- > >>>> Eric Wheeler > >>>> > >>>> > >>>> > >>>>> > >>>>> Kind regards, > >>>>> Andrea > >>>>> >