Re: [RFC] Live resize of backing device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 9/8/22 4:32 PM, Andrea Tomassetti wrote:
From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
From: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx>
Date: Thu, 8 Sep 2022 09:47:55 +0200
Subject: [PATCH] bcache: Add support for live resize of backing devices

Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx>

Hi Andrea,

I am just recovered from Omicron, and able to reply email. Let me place my comments inline.

Hi Coly,
Here is the first version of the patch. There are some points I noted down
that I would like to discuss with you:
 - I found it pretty convenient to hook the call of the new added function
   inside the `register_bcache`. In fact, every time (at least from my
   understandings) a disk changes size, it will trigger a new probe and,
   thus, `register_bcache` will be triggered. The only inconvenient
   is that, in case of success, the function will output

The resize should be triggered manually, and not to do it automatically.

You may create a sysfs file under the cached device's directory, name it as "extend_size" or something else you think better.

Then the sysadmin may extend the cached device size explicitly on a predictable time.

`error: capacity changed` even if it's not really an error.
 - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
   `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
   problem, right?
 - There is some reused code between this new function and
   `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
   a broader scope or make it a macro, what do you think?

Thank you very much,

 drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index ba3909bb6bea..9a77caf2a18f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
     return bch_is_open_cache(dev) || bch_is_open_backing(dev);

+static bool bch_update_capacity(dev_t dev)
+    const size_t max_stripes = min_t(size_t, INT_MAX,
+                     SIZE_MAX / sizeof(atomic_t));
+    uint64_t n, n_old;
+    int nr_stripes_old;
+    bool res = false;
+    struct bcache_device *d;
+    struct cache_set *c, *tc;
+    struct cached_dev *dcp, *t, *dc = NULL;
+    uint64_t parent_nr_sectors;
+    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
+        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
+            if (dcp->bdev->bd_dev == dev) {
+                dc = dcp;
+                goto dc_found;
+            }
+    if (!dc)
+        return false;
+    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
+    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
+        return false;

The above code only handles whole disk using as cached device. If a partition of a hard drive is used as cache device, and there are other data after this partition, such condition should be handled as well. So far I am fine to only extend size when using the whole hard drive as cached device, but more code is necessary to check and only permits size-extend for such condition.

+    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
+        return false;

The above code should be done when all rested are set.

+    d = &dc->disk;
+    /* Force cached device sectors re-calc */
+    calc_cached_dev_sectors(d->c);

Here c->cached_dev_sectors might be changed, if any of the following steps fails, it should be restored to previous value.

+    /* Block writeback thread */
+    down_write(&dc->writeback_lock);
+    nr_stripes_old = d->nr_stripes;
+    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
+    if (!n || n > max_stripes) {
+        pr_err("nr_stripes too large or invalid: %llu (start sector beyond end of disk?)\n",
+            n);
+        goto unblock_and_exit;
+    }
+    d->nr_stripes = n;
+    n = d->nr_stripes * sizeof(atomic_t);
+    n_old = nr_stripes_old * sizeof(atomic_t);
+    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
+        n, GFP_KERNEL);
+    if (!d->stripe_sectors_dirty)
+        goto unblock_and_exit;
+    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
+    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
+    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL);
+    if (!d->full_dirty_stripes)
+        goto unblock_and_exit;

If kvrealloc() fails and NULL set to d->full_dirty_sripes, the previous array should be restored.

+    res = true;
+    up_write(&dc->writeback_lock);
+    return res;

I didn't test the patch, from the first glance, I feel the failure handling should restore all previous values, otherwise the cache device may be in a non-consistent state when extend size fails.

 struct async_reg_args {
     struct delayed_work reg_work;
     char *path;
@@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
             if (lookup_bdev(strim(path), &dev) == 0 &&
-                err = "device already registered";
+                if (bch_update_capacity(dev))
+                    err = "capacity changed";
+                else
+                    err = "device already registered";

As I said, it should be a separated write-only sysfile under the cache device's directory.

                 err = "device busy";

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux