Re: [PATCH 19/20] rbd: support for object-map and fast-diff

Ilya Dryomov <idryomov@xxxxxxxxx> · Wed, 3 Jul 2019 15:15:18 +0200

On Tue, Jul 2, 2019 at 1:55 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>
> On Tue, Jul 2, 2019 at 2:16 AM Dongsheng Yang
> <dongsheng.yang@xxxxxxxxxxxx> wrote:
> >
> > Hi Jason,
> >
> > On 07/02/2019 12:09 AM, Jason Dillaman wrote:
> > > On Tue, Jun 25, 2019 at 10:42 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > >> Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST
> > >> and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map.
> > >>
> > >> Invalid object maps are not trusted, but still updated.  Note that we
> > >> never iterate, resize or invalidate object maps.  If object-map feature
> > >> is enabled but object map fails to load, we just fail the requester
> > >> (either "rbd map" or I/O, by way of post-acquire action).
> > ... ...
> > >> +}
> > >> +
> > >> +static int rbd_object_map_open(struct rbd_device *rbd_dev)
> > >> +{
> > >> +       int ret;
> > >> +
> > >> +       ret = rbd_object_map_lock(rbd_dev);
> > > Only lock/unlock if rbd_dev->spec.snap_id == CEPH_NOSNAP?
> >
> > Hmm, IIUC, rbd_object_map_open() is called in post exclusive-lock
> > acquired, when
> > rbd_dev->spec.snap_id != CEPH_NOSNAP, we don't need to acquire
> > exclusive-lock I think.
>
> Userspace opens the object-map for snapshots (and therefore parent
> images) are well. It doesn't require the exclusive-lock since the
> images should be immutable at the snapshot.
>
> > But maybe we can add an assert in this function to make it clear.
> > >
> > >> +       if (ret)
> > >> +               return ret;
> > >> +
> > >> +       ret = rbd_object_map_load(rbd_dev);
> > >> +       if (ret) {
> > >> +               rbd_object_map_unlock(rbd_dev);
> > >> +               return ret;
> > >> +       }
> > >> +
> > >> +       return 0;
> > >> +}
> > >> +
> > >> +static void rbd_object_map_close(struct rbd_device *rbd_dev)
> > >> +{
> > >> +       rbd_object_map_free(rbd_dev);
> > >> +       rbd_object_map_unlock(rbd_dev);
> > >> +}
> > >> +
> > >> +/*
> > >> + * This function needs snap_id (or more precisely just something to
> > >> + * distinguish between HEAD and snapshot object maps), new_state and
> > >> + * current_state that were passed to rbd_object_map_update().
> > >> + *
> > >> + * To avoid allocating and stashing a context we piggyback on the OSD
> > >> + * request.  A HEAD update has two ops (assert_locked).  For new_state
> > >> + * and current_state we decode our own object_map_update op, encoded in
> > >> + * rbd_cls_object_map_update().
> > > Decoding the OSD request seems a little awkward. Since you would only
> > > update the in-memory state for the HEAD revision, could you just stash
> > > these fields in the "rbd_object_request" struct? Then in
> > > "rbd_object_map_update", set the callback to either a
> > > "rbd_object_map_snapshot_callback" callback or
> > > "rbd_object_map_head_callback".
> >
> > Good idea, even we don't need two callback, because the
> > rbd_object_map_update_finish() will not update snapshot:
>
> The deep-flatten feature requires updating object-map snapshots for
> the child image during copy-up. There just isn't an in-memory version
> of the object-map for that case, if that is what you are refering to.
>
> > +    /*
> > +     * Nothing to do for a snapshot object map.
> > +     */
> > +    if (osd_req->r_num_ops == 1)
> > +        return 0;
> > >> + */
> > >> +static int rbd_object_map_update_finish(struct rbd_obj_request *obj_req,
> > >> +                                       struct ceph_osd_request *osd_req)
> > >> +{
> > >> +       struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev;
> > >> +       struct ceph_osd_data *osd_data;
> > >> +       u64 objno;
> > >> +       u8 state, new_state, current_state;
> > >> +       bool has_current_state;
> > >> +       void *p;
> > ... ...
> > >>
> > >> +/*
> > >> + * Return:
> > >> + *   0 - object map update sent
> > >> + *   1 - object map update isn't needed
> > >> + *  <0 - error
> > >> + */
> > >> +static int rbd_obj_write_post_object_map(struct rbd_obj_request *obj_req)
> > >> +{
> > >> +       struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev;
> > >> +       u8 current_state = OBJECT_PENDING;
> > >> +
> > >> +       if (!(rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP))
> > >> +               return 1;
> > >> +
> > >> +       if (!(obj_req->flags & RBD_OBJ_FLAG_DELETION))
> > >> +               return 1;
> > >> +
> > >> +       return rbd_object_map_update(obj_req, CEPH_NOSNAP, OBJECT_NONEXISTENT,
> > >> +                                    &current_state);
> > >> +}
> > >> +
> > >>   static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result)
> > >>   {
> > >>          struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev;
> > >> @@ -2805,6 +3419,24 @@ static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result)
> > >>          case RBD_OBJ_WRITE_START:
> > >>                  rbd_assert(!*result);
> > >>
> > >> +               if (rbd_obj_write_is_noop(obj_req))
> > >> +                       return true;
> > > Does this properly handle the case where it has a parent overlap? If
> > > the child object doesn't exist, we would still want to perform the
> > > copyup (if required), correct?
> >
> > Good point. I found the zeroout case is handled, but discard not.
> >
> > zeroout will check  (!obj_req->num_img_extents) before setting NOOP
> > flag. but discard check it after setting.

We never perform the copyup for discard because, unlike in librbd,
discard and zeroout are two different operations.

num_img_extents is checked just to pick between delete and truncate.
If there is overlap, delete is a bad choice because it would expose
parent data, effectively going back in time.  While technically a read
from a discarded region is allowed to return _anything_, zeros is what
many people expect, so we truncate instead of deleting in this case.

Thanks,

                Ilya