On Tue, Jul 2, 2019 at 1:55 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > On Tue, Jul 2, 2019 at 2:16 AM Dongsheng Yang > <dongsheng.yang@xxxxxxxxxxxx> wrote: > > > > Hi Jason, > > > > On 07/02/2019 12:09 AM, Jason Dillaman wrote: > > > On Tue, Jun 25, 2019 at 10:42 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > >> Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST > > >> and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map. > > >> > > >> Invalid object maps are not trusted, but still updated. Note that we > > >> never iterate, resize or invalidate object maps. If object-map feature > > >> is enabled but object map fails to load, we just fail the requester > > >> (either "rbd map" or I/O, by way of post-acquire action). > > ... ... > > >> +} > > >> + > > >> +static int rbd_object_map_open(struct rbd_device *rbd_dev) > > >> +{ > > >> + int ret; > > >> + > > >> + ret = rbd_object_map_lock(rbd_dev); > > > Only lock/unlock if rbd_dev->spec.snap_id == CEPH_NOSNAP? > > > > Hmm, IIUC, rbd_object_map_open() is called in post exclusive-lock > > acquired, when > > rbd_dev->spec.snap_id != CEPH_NOSNAP, we don't need to acquire > > exclusive-lock I think. > > Userspace opens the object-map for snapshots (and therefore parent > images) are well. It doesn't require the exclusive-lock since the > images should be immutable at the snapshot. > > > But maybe we can add an assert in this function to make it clear. > > > > > >> + if (ret) > > >> + return ret; > > >> + > > >> + ret = rbd_object_map_load(rbd_dev); > > >> + if (ret) { > > >> + rbd_object_map_unlock(rbd_dev); > > >> + return ret; > > >> + } > > >> + > > >> + return 0; > > >> +} > > >> + > > >> +static void rbd_object_map_close(struct rbd_device *rbd_dev) > > >> +{ > > >> + rbd_object_map_free(rbd_dev); > > >> + rbd_object_map_unlock(rbd_dev); > > >> +} > > >> + > > >> +/* > > >> + * This function needs snap_id (or more precisely just something to > > >> + * distinguish between HEAD and snapshot object maps), new_state and > > >> + * current_state that were passed to rbd_object_map_update(). > > >> + * > > >> + * To avoid allocating and stashing a context we piggyback on the OSD > > >> + * request. A HEAD update has two ops (assert_locked). For new_state > > >> + * and current_state we decode our own object_map_update op, encoded in > > >> + * rbd_cls_object_map_update(). > > > Decoding the OSD request seems a little awkward. Since you would only > > > update the in-memory state for the HEAD revision, could you just stash > > > these fields in the "rbd_object_request" struct? Then in > > > "rbd_object_map_update", set the callback to either a > > > "rbd_object_map_snapshot_callback" callback or > > > "rbd_object_map_head_callback". > > > > Good idea, even we don't need two callback, because the > > rbd_object_map_update_finish() will not update snapshot: > > The deep-flatten feature requires updating object-map snapshots for > the child image during copy-up. There just isn't an in-memory version > of the object-map for that case, if that is what you are refering to. > > > + /* > > + * Nothing to do for a snapshot object map. > > + */ > > + if (osd_req->r_num_ops == 1) > > + return 0; > > >> + */ > > >> +static int rbd_object_map_update_finish(struct rbd_obj_request *obj_req, > > >> + struct ceph_osd_request *osd_req) > > >> +{ > > >> + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; > > >> + struct ceph_osd_data *osd_data; > > >> + u64 objno; > > >> + u8 state, new_state, current_state; > > >> + bool has_current_state; > > >> + void *p; > > ... ... > > >> > > >> +/* > > >> + * Return: > > >> + * 0 - object map update sent > > >> + * 1 - object map update isn't needed > > >> + * <0 - error > > >> + */ > > >> +static int rbd_obj_write_post_object_map(struct rbd_obj_request *obj_req) > > >> +{ > > >> + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; > > >> + u8 current_state = OBJECT_PENDING; > > >> + > > >> + if (!(rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP)) > > >> + return 1; > > >> + > > >> + if (!(obj_req->flags & RBD_OBJ_FLAG_DELETION)) > > >> + return 1; > > >> + > > >> + return rbd_object_map_update(obj_req, CEPH_NOSNAP, OBJECT_NONEXISTENT, > > >> + ¤t_state); > > >> +} > > >> + > > >> static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result) > > >> { > > >> struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; > > >> @@ -2805,6 +3419,24 @@ static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result) > > >> case RBD_OBJ_WRITE_START: > > >> rbd_assert(!*result); > > >> > > >> + if (rbd_obj_write_is_noop(obj_req)) > > >> + return true; > > > Does this properly handle the case where it has a parent overlap? If > > > the child object doesn't exist, we would still want to perform the > > > copyup (if required), correct? > > > > Good point. I found the zeroout case is handled, but discard not. > > > > zeroout will check (!obj_req->num_img_extents) before setting NOOP > > flag. but discard check it after setting. We never perform the copyup for discard because, unlike in librbd, discard and zeroout are two different operations. num_img_extents is checked just to pick between delete and truncate. If there is overlap, delete is a bad choice because it would expose parent data, effectively going back in time. While technically a read from a discarded region is allowed to return _anything_, zeros is what many people expect, so we truncate instead of deleting in this case. Thanks, Ilya