Those are all internal classes -- the cls types are already dependencies within the librbd internals. Feel free to add the necessary include and use it directly from within librbd. On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@xxxxxxxxxxxx> wrote: > I have a question about where SnapshotNamespace type should be placed. > I placed it in cls/rbd/cls_rbd_types.h because cls client and cls > backend components should have access to this type. > Also this type is required in librbd/Operations.cc - because we want > to specify in what namespace Operations::snap_create should create > snapshots. > However Operations.cc doesn't import cls_rbd_types.h right now. If the > question was about public interface of librbd/librbd.cc, then I would > create a duplicate of SnapshotNamespace type in librbd layer without > hesitation. > But these functions are internal, so, my question is whether it's > really feasible to create another type for SnapshotNamespace in librbd > layer. > > > On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@xxxxxxxxxxxx> wrote: >> Right, I forgot about snaphot "namespaces". I'll add this part. >> I guess it makes sense to discuss the whole thing on the next CDM. >> >> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: >>> I think the first step is to implement the concept of snapshot "namespaces". >>> >>> This could be implemented as an optional variant structure associated >>> with each snapshot at creation (see the ImageWatcher RPC messages or >>> journaling event type encoding for examples of this). For consistency >>> group snapshots, this structure would identify the snapshot as >>> belonging to the consistency group and have a unique id back to the >>> specific group snapshot. >>> >>> When creating a snapshot, the state machine would (1) create the group >>> snapshot record, (2) set the state of the group to "creating snapshot" >>> (to prevent new images from being added/removed from the group while >>> the op is in-progress), (3) acquire the lock for all images in the >>> group, (4) create the individual image snapshots with the linkage back >>> to the group snapshot record (can be performed in parallel up to max >>> concurrent ops), (5) release the exclusive locks, and (6) reset the >>> group status to "ready". >>> >>> If you have a hard crash/failure anywhere, a "snap remove" operation >>> should be designed to get the group back into consistent state (i.e. >>> remove any snapshots linked to the group and reset the group state >>> back to ready). >>> >>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@xxxxxxxxxxxx> wrote: >>>> Guys, >>>> >>>> I updated Snapshots section of this document: >>>> http://pad.ceph.com/p/consistency_groups, in accordance with my >>>> improved understanding of how it should be implemented. >>>> Please take a look and provide your comments. Some of my concerns >>>> regarding the implementation I highlighted in bold. >>>> >>>> Looking forward to your valuable remarks. >>>> Thanks in advance. >>>> V. >>>> >>>> >>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@xxxxxxxxxxxx> wrote: >>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>>>>> How do I get notified about it? >>>>> >>>>> The image watcher is notified, which triggers its handler: >>>>> >>>>> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >>>>> >>>>> The handler calls the current lock policy method `lock_requested()`, >>>>> which will define what to do with the lock request. The StandartPolicy >>>>> is to release the lock, so it may ping-ponging between the >>>>> clients. You may define a different policy -- rbd-mirror is an example >>>>> where it is used. >>>>> >>>>> Everywhere where an operation needs the exclusive lock, it is always >>>>> checked if we currently are a lock owner, i.e: >>>>> >>>>> ictx->exclusive_lock->is_lock_owner() >>>>> >>>>> and if it is false, the exlusive lock is requested. Before this check >>>>> you need to aquire ctx->owner_lock, and until you release owner_lock >>>>> you can be sure your exclusive lock will not leak to another >>>>> client. After releasing owner_lock, you will need to repeate the check >>>>> again when you need it. >>>>> >>>>> -- >>>>> Mykola Golub >>>>> >>>>>> >>>>>> >>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@xxxxxxxxxxxx> wrote: >>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>>>>> >> Could you please point me to the place in source code where writer >>>>>> >> acquires an exclusive lock on the image. >>>>>> > >>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>>>>> > (after opening the image) is: >>>>>> > >>>>>> > ``` >>>>>> > C_SaferCond lock_ctx; >>>>>> > { >>>>>> > RWLock::WLocker l(ictx->owner_lock); >>>>>> > >>>>>> > if (ictx->exclusive_lock == nullptr) { >>>>>> > // exclusive-lock feature is not enabled >>>>>> > return -EINVAL; >>>>>> > } >>>>>> > >>>>>> > // Request the lock. If it is currently owned by another client, >>>>>> > // RPC message will be sent to the client to release the lock. >>>>>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>>>>> > } // release owner_lock before waiting to avoid potential deadlock >>>>>> > >>>>>> > int r = lock_ctx.wait(); >>>>>> > if (r < 0) { >>>>>> > return r; >>>>>> > } >>>>>> > >>>>>> > RWLock::RLocker l(ictx->owner_lock); >>>>>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>>>>> > // failed to acquire exclusive lock >>>>>> > return -EROFS; >>>>>> > } >>>>>> > >>>>>> > // At this point lock is acquired >>>>>> > ... >>>>>> > >>>>>> > ``` >>>>>> > >>>>>> > You might want to look at this PR >>>>>> > >>>>>> > https://github.com/ceph/ceph/pull/9592 >>>>>> > >>>>>> > where we discuss adding API methods to directly acquire and release >>>>>> > the exclusive lock. You don't need the API, but will find examples in >>>>>> > the patch, and also useful comments from Jason. >>>>>> > >>>>>> > -- >>>>>> > Mykola Golub >>>>>> > >>>>>> >> I presume you were talking about the feature: >>>>>> >> exclusive_lock, shared_lock which can be used from command line using >>>>>> >> commands lock list, lock break. >>>>>> >> >>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: >>>>>> >> > There is already a "request lock" RPC message and this is already handled >>>>>> >> > transparently within librbd when you attempt to acquire the lock and another >>>>>> >> > client owns it. >>>>>> >> > >>>>>> >> > >>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@xxxxxxxxxxxx> wrote: >>>>>> >> >> >>>>>> >> >> If an image already has a writer who owns the lock, >>>>>> >> >> should I implement a notification that allows to ask the writer to >>>>>> >> >> release the lock, >>>>>> >> >> is there already a standard way to intercept the exclusive lock? >>>>>> >> >> >>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@xxxxxxxxxx> >>>>>> >> >> wrote: >>>>>> >> >> > ... one more thing: >>>>>> >> >> > >>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>>>>> >> >> > indicate that an image is part of a consistency group to prevent older >>>>>> >> >> > librbd clients from removing the image or group snapshots. This could >>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>>>>> >> >> > still open the image R/O while its part of a group. >>>>>> >> >> > >>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@xxxxxxxxxx> >>>>>> >> >> > wrote: >>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group >>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>>>>> >> >> >> group snapshot hasn't been created yet. This would also require >>>>>> >> >> >> commit/abort group snapshot RPC messages. >>>>>> >> >> >> >>>>>> >> >> >> However, thinking about this last night, here is another potential >>>>>> >> >> >> option: >>>>>> >> >> >> >>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of >>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image >>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>>>>> >> >> >> exclusive lock handling process will automatically take care of >>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>>>>> >> >> >> >>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate >>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be >>>>>> >> >> >> handled internally). >>>>>> >> >> >> >>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@xxxxxxxxxxxx> >>>>>> >> >> >> wrote: >>>>>> >> >> >>> Gentlemen, >>>>>> >> >> >>> >>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>>>>> >> >> >>> >>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>>>>> >> >> >>> consistency group, >>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency >>>>>> >> >> >>> group. >>>>>> >> >> >>> Let me call >>>>>> >> >> >>> group client - a client which requests a consistency group to take a >>>>>> >> >> >>> snapshot. >>>>>> >> >> >>> Image client - the client that writes to an image. >>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of >>>>>> >> >> >>> the image clients the group client can die. >>>>>> >> >> >>> >>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>>>>> >> >> >>> image clients. >>>>>> >> >> >>> >>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>>>>> >> >> >>> consistency group. >>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs >>>>>> >> >> >>> group list command. >>>>>> >> >> >>> There will be a command like group recover, which allows users to >>>>>> >> >> >>> rollback unsuccessful snapshots >>>>>> >> >> >>> or continue them using create snapshot command. >>>>>> >> >> >>> >>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>>>>> >> >> >>> image client. If group client fails to heart beat then image client >>>>>> >> >> >>> unquiesces itself and continues normal operation. >>>>>> >> >> >>> >>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>>>>> >> >> >>> normal operation informing group client of the fact. >>>>>> >> >> >>> >>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options >>>>>> >> >> >>> that I miss. >>>>>> >> >> >>> >>>>>> >> >> >>> Thanks, >>>>>> >> >> >>> Victor. >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> -- >>>>>> >> >> >> Jason >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > -- >>>>>> >> >> > Jason >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > -- >>>>>> >> > Jason >>>>>> >> > >>>>>> >> -- >>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >>> -- >>> Jason -- Jason -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html