On Wed, Jan 30, 2019 at 9:42 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > On Wed, Jan 30, 2019 at 8:16 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > On Wed, Jan 30, 2019 at 9:41 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > > > > On Wed, Jan 30, 2019 at 2:53 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > > > > > Should we try to ensure that librbd and krbd have (semi) matching > > > > discard behavior? In librbd, a full object discard on an overlapping > > > > child will truncate the child object to size zero instead of issuing a > > > > delete. With this krbd change, if a child object had previously been > > > > written and now was discarded, the parent data will become visible > > > > again since the child object is deleted. This could be prevented by > > > > > > Right, this is what librbd was doing until about a year ago when > > > I fixed the parent data exposure with a compound create+truncate. > > > > > > > issuing a "assert_exists" + "truncate(0)" op to prevent the implicit > > > > zeroing of a full object extent if it was never written and would > > > > prevent the parent data from becoming visible again after a discard. > > > > > > truncate on a non-existent object is a no-op, so I'm not sure how > > > assert_exists+truncate(0) is different from plain truncate(0). > > > > Thanks, I forgot about that. My intent was to ensure an existing > > object would be truncated to zero but not create an object if it > > doesn't exist. > > > > > > Obviously the data is still consistent either way but it would be good > > > > to match. > > > > > > I thought about this, but decided to drop reverse mapping and overlap > > > calculation code from rbd_obj_setup_discard() simply because it would > > > have been semi-matching at best. It also seems clearer to me: in the > > > zeroout case we worry about copyups and parent blocks, in the discard > > > case we don't. Special casing whole-object discards felt like mixing > > > them up without a good reason. > > > > I was only minimally concerned about the case where the copy-up had > > already occurred so your delete-on-discard would now re-expose the > > parent object extent (e.g. parent has all 'A's, child had all 'B's, so > > the discard switches the child to all 'A's instead of zeroes). > > Effectively, a full object discard will cause your object extent to go > > back-in-time. > > Well, returning stale (or random) data from the discarded region > technically doesn't violate any specs. On a second thought, I think > you are right though -- I only dropped create+truncate because I wanted > to avoid having to calculate the reverse mapping for mostly hand wavy > reasons that definitely don't stand against the sneakiness of going > back in time. I'll put the "delete or create+truncate" logic back > tomorrow. ... and of course for ensuring that an existing object is truncated but not creating an object if it doesn't exist a simple truncate is enough. A compound create+truncate is only needed for zeroout. Sorry for the confusion, here is the patch on top: diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 1f5517355af3..9efa7cd54e0c 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1871,6 +1871,7 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; u64 off = obj_req->ex.oe_off; u64 next_off = obj_req->ex.oe_off + obj_req->ex.oe_len; + int ret; /* * Align the range to alloc_size boundary and punt on discards @@ -1888,11 +1889,16 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) return 1; } + /* reverse map the entire object onto the parent */ + ret = rbd_obj_calc_img_extents(obj_req, true); + if (ret) + return ret; + obj_req->osd_req = rbd_osd_req_create(obj_req, 1); if (!obj_req->osd_req) return -ENOMEM; - if (rbd_obj_is_entire(obj_req)) { + if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) { osd_req_op_init(obj_req->osd_req, 0, CEPH_OSD_OP_DELETE, 0); } else { dout("%s %p %llu~%llu -> %llu~%llu\n", __func__, Thanks, Ilya