Re: [RFC] Live resize of backing device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
From: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx>
Date: Thu, 8 Sep 2022 09:47:55 +0200
Subject: [PATCH] bcache: Add support for live resize of backing devices

Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx>
Hi Coly,
Here is the first version of the patch. There are some points I noted down
that I would like to discuss with you:
 - I found it pretty convenient to hook the call of the new added function
   inside the `register_bcache`. In fact, every time (at least from my
   understandings) a disk changes size, it will trigger a new probe and,
   thus, `register_bcache` will be triggered. The only inconvenient
   is that, in case of success, the function will output
   `error: capacity changed` even if it's not really an error.
 - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
   `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
   problem, right?
 - There is some reused code between this new function and
   `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
   a broader scope or make it a macro, what do you think?

Thank you very much,

 drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index ba3909bb6bea..9a77caf2a18f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
 	return bch_is_open_cache(dev) || bch_is_open_backing(dev);

+static bool bch_update_capacity(dev_t dev)
+	const size_t max_stripes = min_t(size_t, INT_MAX,
+					 SIZE_MAX / sizeof(atomic_t));
+	uint64_t n, n_old;
+	int nr_stripes_old;
+	bool res = false;
+	struct bcache_device *d;
+	struct cache_set *c, *tc;
+	struct cached_dev *dcp, *t, *dc = NULL;
+	uint64_t parent_nr_sectors;
+	list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
+		list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
+			if (dcp->bdev->bd_dev == dev) {
+				dc = dcp;
+				goto dc_found;
+			}
+	if (!dc)
+		return false;
+	parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
+	if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
+		return false;
+	if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
+		return false;
+	d = &dc->disk;
+	/* Force cached device sectors re-calc */
+	calc_cached_dev_sectors(d->c);
+	/* Block writeback thread */
+	down_write(&dc->writeback_lock);
+	nr_stripes_old = d->nr_stripes;
+	n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
+	if (!n || n > max_stripes) {
+ pr_err("nr_stripes too large or invalid: %llu (start sector beyond end of disk?)\n",
+			n);
+		goto unblock_and_exit;
+	}
+	d->nr_stripes = n;
+	n = d->nr_stripes * sizeof(atomic_t);
+	n_old = nr_stripes_old * sizeof(atomic_t);
+	d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
+		n, GFP_KERNEL);
+	if (!d->stripe_sectors_dirty)
+		goto unblock_and_exit;
+	n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
+	n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
+ d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL);
+	if (!d->full_dirty_stripes)
+		goto unblock_and_exit;
+	res = true;
+	up_write(&dc->writeback_lock);
+	return res;
 struct async_reg_args {
 	struct delayed_work reg_work;
 	char *path;
@@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 			if (lookup_bdev(strim(path), &dev) == 0 &&
-				err = "device already registered";
+				if (bch_update_capacity(dev))
+					err = "capacity changed";
+				else
+					err = "device already registered";
 				err = "device busy";

On 6/9/22 15:22, Andrea Tomassetti wrote:
Hi Coly,
I have finally some time to work on the patch. I already have a first
prototype that seems to work but, before sending it, I would like to
ask you two questions:
   1. Inspecting the code, I found that the only lock mechanism is the
writeback_lock semaphore. Am I correct?
   2. How can I effectively test my patch? So far I'm doing something like this:
      a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
      b. mkfs.xfs /dev/bcache0
      c. dd if=/dev/random of=/mnt/random bs=1M count=1000
      d. md5sum /mnt/random | tee /mnt/random.md5
      e. live resize the disk and repeat c. and d.
      f. umount/reboot/remount and check that the md5 hashes are correct

Any suggestions?

Thank you very much,

On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> wrote:

On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
Hi Coly,
In one of our previous emails you said that
Currently bcache doesn’t support cache or backing device resize

I was investigating this point and I actually found a solution. I
briefly tested it and it seems to work fine.
Basically what I'm doing is:
   1. Check if there's any discrepancy between the nr of sectors
reported by the bcache backing device (holder) and the nr of sectors
reported by its parent (slave).
   2. If the number of sectors of the two devices are not the same,
then call set_capacity_and_notify on the bcache device.
   3. From user space, depending on the fs used, grow the fs with some
utility (e.g. xfs_growfs)

This works without any need of unmounting the mounted fs nor stopping
the bcache backing device.

Well done! +1, would love to see a patch for this!

So my question is: am I missing something? Can this live resize cause
some problems (e.g. data loss)? Would it be useful if I send a patch on

A while a go we looked into doing this.  Here is the summary of our
findings, not sure if there are any other considerations:

   1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger
      resize on echo 1 >.
   2. Refactor the set_capacity() bits from  bcache_device_init() into its
      own function.
   3. Put locks around bcache_device.full_dirty_stripes and
      bcache_device.stripe_sectors_dirty.  Re-alloc+copy+free and zero the
      new bytes at the end.  Grep where bcache_device.full_dirty_stripes is
      used and make sure it is locked appropriately, probably in the
      writeback thread, maybe other places too.

The cachedev's don't know anything about the bdev size, so (according to
Kent) they will "just work" by referencing new offsets that were
previously beyond the disk. (This is basically the same as resizing the
bdev and then unregister/re-register which is how we resize bdevs now.)

As for resizing a cachedev, I've not looked at all---not sure about that
one.  We always detach, resize, make-bcache and re-attach the new cache.
Maybe it is similarly simple, but haven't looked.

Eric Wheeler

Kind regards,

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux