On Fri, Mar 23, 2018 at 10:31 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Thu, Mar 22, 2018 at 12:57 PM, 亀井仁志 / KAMEI,HITOSHI > <hitoshi.kamei.xm@xxxxxxxxxxx> wrote: >> Hi Yang, >> >>> I am not sure is this the best way for this case, what about adding an option in "rbd map -o thick rbd/test"? >> >> I will add such option to the rbd map command to manipulate image settings. So, the end-user >> do not change the settings directly via sysfs file. >> >>> @@ -4011,6 +4012,15 @@ static void rbd_queue_workfn(struct work_struct *work) >>> goto err; >>> } >>> >>> + /* Ignore/skip discard requests for thick-provision image */ >>> >>> Just ignore? or return -EOPNOTSUPP? >> >> Thanks, I think -EOPNOTSUPP is better because user programs cannot know >> the result of requested operation when the kernel rbd driver ignores >> discard request. The result of requested operation when the kernel rbd driver >> ignores discard requests, which probably misleads the user programs. >> >>> In addition, we should not ignore the REQ_OP_WRITE_ZEROES. >> >> Relating to the above, the return code of REQ_OP_WRITE_ZEROS request >> is also -EOPNOTSUPP instead of ignoring. I think the result of >> -EOPNOTSUPP is also better for this request because the kernel >> rbd driver can expect that user programs write zero data by itself. > > REQ_OP_WRITE_ZEROES should continue to work, we just need to make > sure it never issues truncates or deletes and instead writes zeroes > explicitly. > > I think we should be explicit about the fact that discard is not > supported instead of accepting the discard request and failing it in > rbd_queue_workfn(). Attached patch is what I have in mind. Now with the patch. Thanks, Ilya
From 1db405366371cb12e5182815c06f6f491af4b63c Mon Sep 17 00:00:00 2001 From: Ilya Dryomov <idryomov@xxxxxxxxx> Date: Fri, 23 Mar 2018 09:14:47 +0100 Subject: [PATCH] rbd: notrim map option Add an option to turn off discard and write zeroes offload support to avoid deprovisioning a fully provisioned image. When enabled, discard requests will fail with -EOPNOTSUPP, write zeroes requests will fall back to manually zeroing. Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx> --- drivers/block/rbd.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 1a90e5dd7470..dbe440a6531e 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -733,6 +733,7 @@ enum { Opt_read_write, Opt_lock_on_read, Opt_exclusive, + Opt_notrim, Opt_err }; @@ -746,6 +747,7 @@ static match_table_t rbd_opts_tokens = { {Opt_read_write, "rw"}, /* Alternate spelling */ {Opt_lock_on_read, "lock_on_read"}, {Opt_exclusive, "exclusive"}, + {Opt_notrim, "notrim"}, {Opt_err, NULL} }; @@ -754,12 +756,14 @@ struct rbd_options { bool read_only; bool lock_on_read; bool exclusive; + bool trim; }; #define RBD_QUEUE_DEPTH_DEFAULT BLKDEV_MAX_RQ #define RBD_READ_ONLY_DEFAULT false #define RBD_LOCK_ON_READ_DEFAULT false #define RBD_EXCLUSIVE_DEFAULT false +#define RBD_TRIM_DEFAULT true static int parse_rbd_opts_token(char *c, void *private) { @@ -801,6 +805,9 @@ static int parse_rbd_opts_token(char *c, void *private) case Opt_exclusive: rbd_opts->exclusive = true; break; + case Opt_notrim: + rbd_opts->trim = false; + break; default: /* libceph prints "bad option" msg */ return -EINVAL; @@ -3940,11 +3947,12 @@ static int rbd_init_disk(struct rbd_device *rbd_dev) blk_queue_io_min(q, objset_bytes); blk_queue_io_opt(q, objset_bytes); - /* enable the discard support */ - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); - q->limits.discard_granularity = objset_bytes; - blk_queue_max_discard_sectors(q, objset_bytes >> SECTOR_SHIFT); - blk_queue_max_write_zeroes_sectors(q, objset_bytes >> SECTOR_SHIFT); + if (rbd_dev->opts->trim) { + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); + q->limits.discard_granularity = objset_bytes; + blk_queue_max_discard_sectors(q, objset_bytes >> SECTOR_SHIFT); + blk_queue_max_write_zeroes_sectors(q, objset_bytes >> SECTOR_SHIFT); + } if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC)) q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; @@ -5170,6 +5178,7 @@ static int rbd_add_parse_args(const char *buf, rbd_opts->queue_depth = RBD_QUEUE_DEPTH_DEFAULT; rbd_opts->lock_on_read = RBD_LOCK_ON_READ_DEFAULT; rbd_opts->exclusive = RBD_EXCLUSIVE_DEFAULT; + rbd_opts->trim = RBD_TRIM_DEFAULT; copts = ceph_parse_options(options, mon_addrs, mon_addrs + mon_addrs_size - 1, -- 2.4.3