On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: > What if I'm holding this lock and somebody else is trying to reacquire the lock. > How do I get notified about it? The image watcher is notified, which triggers its handler: ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) The handler calls the current lock policy method `lock_requested()`, which will define what to do with the lock request. The StandartPolicy is to release the lock, so it may ping-ponging between the clients. You may define a different policy -- rbd-mirror is an example where it is used. Everywhere where an operation needs the exclusive lock, it is always checked if we currently are a lock owner, i.e: ictx->exclusive_lock->is_lock_owner() and if it is false, the exlusive lock is requested. Before this check you need to aquire ctx->owner_lock, and until you release owner_lock you can be sure your exclusive lock will not leak to another client. After releasing owner_lock, you will need to repeate the check again when you need it. -- Mykola Golub > > > On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@xxxxxxxxxxxx> wrote: > > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: > >> Could you please point me to the place in source code where writer > >> acquires an exclusive lock on the image. > > > > Grep for 'exclusive_lock->request_lock'. Basically, what you need > > (after opening the image) is: > > > > ``` > > C_SaferCond lock_ctx; > > { > > RWLock::WLocker l(ictx->owner_lock); > > > > if (ictx->exclusive_lock == nullptr) { > > // exclusive-lock feature is not enabled > > return -EINVAL; > > } > > > > // Request the lock. If it is currently owned by another client, > > // RPC message will be sent to the client to release the lock. > > ictx->exclusive_lock->request_lock(&lock_ctx); > > } // release owner_lock before waiting to avoid potential deadlock > > > > int r = lock_ctx.wait(); > > if (r < 0) { > > return r; > > } > > > > RWLock::RLocker l(ictx->owner_lock); > > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { > > // failed to acquire exclusive lock > > return -EROFS; > > } > > > > // At this point lock is acquired > > ... > > > > ``` > > > > You might want to look at this PR > > > > https://github.com/ceph/ceph/pull/9592 > > > > where we discuss adding API methods to directly acquire and release > > the exclusive lock. You don't need the API, but will find examples in > > the patch, and also useful comments from Jason. > > > > -- > > Mykola Golub > > > >> I presume you were talking about the feature: > >> exclusive_lock, shared_lock which can be used from command line using > >> commands lock list, lock break. > >> > >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > >> > There is already a "request lock" RPC message and this is already handled > >> > transparently within librbd when you attempt to acquire the lock and another > >> > client owns it. > >> > > >> > > >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@xxxxxxxxxxxx> wrote: > >> >> > >> >> If an image already has a writer who owns the lock, > >> >> should I implement a notification that allows to ask the writer to > >> >> release the lock, > >> >> is there already a standard way to intercept the exclusive lock? > >> >> > >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@xxxxxxxxxx> > >> >> wrote: > >> >> > ... one more thing: > >> >> > > >> >> > I was also thinking that we need a new RBD feature bit to be used to > >> >> > indicate that an image is part of a consistency group to prevent older > >> >> > librbd clients from removing the image or group snapshots. This could > >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can > >> >> > still open the image R/O while its part of a group. > >> >> > > >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@xxxxxxxxxx> > >> >> > wrote: > >> >> >> Way back in April when we had the CDM, I was originally thinking we > >> >> >> should implement option 3. Essentially, you have a prepare group > >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. > >> >> >> When that lease expires, IO would automatically be resumed even if the > >> >> >> group snapshot hasn't been created yet. This would also require > >> >> >> commit/abort group snapshot RPC messages. > >> >> >> > >> >> >> However, thinking about this last night, here is another potential > >> >> >> option: > >> >> >> > >> >> >> Option 4 - require images to have the exclusive lock feature before > >> >> >> they can be added to a consistency group (and prevent disabling of > >> >> >> exclusive-lock while they are part of a group). Then librbd, via the > >> >> >> rbd CLI (or client application of the rbd consistency group snap > >> >> >> create API), can co-operatively acquire the lock from all active image > >> >> >> clients within the group (i.e. all IO has been flushed and paused) and > >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal > >> >> >> exclusive lock handling process will automatically take care of > >> >> >> re-acquiring the lock from the dead client and resuming IO. > >> >> >> > >> >> >> This option not only re-uses existing code, it would also eliminate > >> >> >> the need to add/update the RPC messages for prepare/commit/abort > >> >> >> snapshot creation to support group snapshots (since it could all be > >> >> >> handled internally). > >> >> >> > >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@xxxxxxxxxxxx> > >> >> >> wrote: > >> >> >>> Gentlemen, > >> >> >>> > >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. > >> >> >>> > >> >> >>> Here is the situation. In order to take snapshots of all images in a > >> >> >>> consistency group, > >> >> >>> we first need to quiesce all the image writers in the consistency > >> >> >>> group. > >> >> >>> Let me call > >> >> >>> group client - a client which requests a consistency group to take a > >> >> >>> snapshot. > >> >> >>> Image client - the client that writes to an image. > >> >> >>> Let's say group client starts sending notify_quiesce to all image > >> >> >>> clients that write to the images in the group. After quiescing half of > >> >> >>> the image clients the group client can die. > >> >> >>> > >> >> >>> It presents us with a dilemma - what should we do with those quiesced > >> >> >>> image clients. > >> >> >>> > >> >> >>> Option 1 - is to wait till someone manually runs recover for that > >> >> >>> consistency group. > >> >> >>> We can show warning next to those unfinished groups when user runs > >> >> >>> group list command. > >> >> >>> There will be a command like group recover, which allows users to > >> >> >>> rollback unsuccessful snapshots > >> >> >>> or continue them using create snapshot command. > >> >> >>> > >> >> >>> Option 2 - is to establish some heart beats between group client and > >> >> >>> image client. If group client fails to heart beat then image client > >> >> >>> unquiesces itself and continues normal operation. > >> >> >>> > >> >> >>> Option 3 - is to have a timeout for each image client. If group client > >> >> >>> fails to make a group snapshot within this timeout then we resume our > >> >> >>> normal operation informing group client of the fact. > >> >> >>> > >> >> >>> Which of these options do you prefer? Probably there are other options > >> >> >>> that I miss. > >> >> >>> > >> >> >>> Thanks, > >> >> >>> Victor. > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Jason > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Jason > >> > > >> > > >> > > >> > -- > >> > Jason > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html