Re: [PATCH 0/3] krbd discard optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 30, 2019 at 9:42 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>
> On Wed, Jan 30, 2019 at 8:16 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> >
> > On Wed, Jan 30, 2019 at 9:41 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > >
> > > On Wed, Jan 30, 2019 at 2:53 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > > >
> > > > Should we try to ensure that librbd and krbd have (semi) matching
> > > > discard behavior? In librbd, a full object discard on an overlapping
> > > > child will truncate the child object to size zero instead of issuing a
> > > > delete. With this krbd change, if a child object had previously been
> > > > written and now was discarded, the parent data will become visible
> > > > again since the child object is deleted. This could be prevented by
> > >
> > > Right, this is what librbd was doing until about a year ago when
> > > I fixed the parent data exposure with a compound create+truncate.
> > >
> > > > issuing a "assert_exists" + "truncate(0)" op to prevent the implicit
> > > > zeroing of a full object extent if it was never written and would
> > > > prevent the parent data from becoming visible again after a discard.
> > >
> > > truncate on a non-existent object is a no-op, so I'm not sure how
> > > assert_exists+truncate(0) is different from plain truncate(0).
> >
> > Thanks, I forgot about that. My intent was to ensure an existing
> > object would be truncated to zero but not create an object if it
> > doesn't exist.
> >
> > > > Obviously the data is still consistent either way but it would be good
> > > > to match.
> > >
> > > I thought about this, but decided to drop reverse mapping and overlap
> > > calculation code from rbd_obj_setup_discard() simply because it would
> > > have been semi-matching at best.  It also seems clearer to me: in the
> > > zeroout case we worry about copyups and parent blocks, in the discard
> > > case we don't.  Special casing whole-object discards felt like mixing
> > > them up without a good reason.
> >
> > I was only minimally concerned about the case where the copy-up had
> > already occurred so your delete-on-discard would now re-expose the
> > parent object extent (e.g. parent has all 'A's, child had all 'B's, so
> > the discard switches the child to all 'A's instead of zeroes).
> > Effectively, a full object discard will cause your object extent to go
> > back-in-time.
>
> Well, returning stale (or random) data from the discarded region
> technically doesn't violate any specs.  On a second thought, I think
> you are right though -- I only dropped create+truncate because I wanted
> to avoid having to calculate the reverse mapping for mostly hand wavy
> reasons that definitely don't stand against the sneakiness of going
> back in time.  I'll put the "delete or create+truncate" logic back
> tomorrow.

... and of course for ensuring that an existing object is truncated but
not creating an object if it doesn't exist a simple truncate is enough.
A compound create+truncate is only needed for zeroout.

Sorry for the confusion, here is the patch on top:

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 1f5517355af3..9efa7cd54e0c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1871,6 +1871,7 @@ static int rbd_obj_setup_discard(struct
rbd_obj_request *obj_req)
        struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev;
        u64 off = obj_req->ex.oe_off;
        u64 next_off = obj_req->ex.oe_off + obj_req->ex.oe_len;
+       int ret;

        /*
         * Align the range to alloc_size boundary and punt on discards
@@ -1888,11 +1889,16 @@ static int rbd_obj_setup_discard(struct
rbd_obj_request *obj_req)
                        return 1;
        }

+       /* reverse map the entire object onto the parent */
+       ret = rbd_obj_calc_img_extents(obj_req, true);
+       if (ret)
+               return ret;
+
        obj_req->osd_req = rbd_osd_req_create(obj_req, 1);
        if (!obj_req->osd_req)
                return -ENOMEM;

-       if (rbd_obj_is_entire(obj_req)) {
+       if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) {
                osd_req_op_init(obj_req->osd_req, 0, CEPH_OSD_OP_DELETE, 0);
        } else {
                dout("%s %p %llu~%llu -> %llu~%llu\n", __func__,

Thanks,

                Ilya



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux