Re: Snapshots of consistency groups

Mykola Golub <mgolub@xxxxxxxxxxxx> · Sat, 20 Aug 2016 19:27:15 +0300

On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
> What if I'm holding this lock and somebody else is trying to reacquire the lock.
> How do I get notified about it?

The image watcher is notified, which triggers its handler:

 ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)

The handler calls the current lock policy method `lock_requested()`,
which will define what to do with the lock request. The StandartPolicy
is to release the lock, so it may ping-ponging between the
clients. You may define a different policy -- rbd-mirror is an example
where it is used.

Everywhere where an operation needs the exclusive lock, it is always
checked if we currently are a lock owner, i.e:

 ictx->exclusive_lock->is_lock_owner()

and if it is false, the exlusive lock is requested. Before this check
you need to aquire ctx->owner_lock, and until you release owner_lock
you can be sure your exclusive lock will not leak to another
client. After releasing owner_lock, you will need to repeate the check
again when you need it.

-- 
Mykola Golub

> 
> 
> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@xxxxxxxxxxxx> wrote:
> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
> >> Could you please point me to the place in source code where writer
> >> acquires an exclusive lock on the image.
> >
> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
> > (after opening the image) is:
> >
> > ```
> >   C_SaferCond lock_ctx;
> >   {
> >     RWLock::WLocker l(ictx->owner_lock);
> >
> >     if (ictx->exclusive_lock == nullptr) {
> >       // exclusive-lock feature is not enabled
> >       return -EINVAL;
> >     }
> >
> >     // Request the lock. If it is currently owned by another client,
> >     // RPC message will be sent to the client to release the lock.
> >     ictx->exclusive_lock->request_lock(&lock_ctx);
> >   } // release owner_lock before waiting to avoid potential deadlock
> >
> >   int r = lock_ctx.wait();
> >   if (r < 0) {
> >     return r;
> >   }
> >
> >   RWLock::RLocker l(ictx->owner_lock);
> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
> >        // failed to acquire exclusive lock
> >        return -EROFS;
> >   }
> >
> >   // At this point lock is acquired
> >   ...
> >
> > ```
> >
> > You might want to look at this PR
> >
> >  https://github.com/ceph/ceph/pull/9592
> >
> > where we discuss adding API methods to directly acquire and release
> > the exclusive lock. You don't need the API, but will find examples in
> > the patch, and also useful comments from Jason.
> >
> > --
> > Mykola Golub
> >
> >> I presume you were talking about the feature:
> >> exclusive_lock, shared_lock which can be used from command line using
> >> commands lock list, lock break.
> >>
> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> >> > There is already a "request lock" RPC message and this is already handled
> >> > transparently within librbd when you attempt to acquire the lock and another
> >> > client owns it.
> >> >
> >> >
> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@xxxxxxxxxxxx> wrote:
> >> >>
> >> >> If an image already has a writer who owns the lock,
> >> >> should I implement a notification that allows to ask the writer to
> >> >> release the lock,
> >> >> is there already a standard way to intercept the exclusive lock?
> >> >>
> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@xxxxxxxxxx>
> >> >> wrote:
> >> >> > ... one more thing:
> >> >> >
> >> >> > I was also thinking that we need a new RBD feature bit to be used to
> >> >> > indicate that an image is part of a consistency group to prevent older
> >> >> > librbd clients from removing the image or group snapshots.  This could
> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
> >> >> > still open the image R/O while its part of a group.
> >> >> >
> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@xxxxxxxxxx>
> >> >> > wrote:
> >> >> >> Way back in April when we had the CDM, I was originally thinking we
> >> >> >> should implement option 3. Essentially, you have a prepare group
> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
> >> >> >> When that lease expires, IO would automatically be resumed even if the
> >> >> >> group snapshot hasn't been created yet.  This would also require
> >> >> >> commit/abort group snapshot RPC messages.
> >> >> >>
> >> >> >> However, thinking about this last night, here is another potential
> >> >> >> option:
> >> >> >>
> >> >> >> Option 4 - require images to have the exclusive lock feature before
> >> >> >> they can be added to a consistency group (and prevent disabling of
> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
> >> >> >> rbd CLI (or client application of the rbd consistency group snap
> >> >> >> create API), can co-operatively acquire the lock from all active image
> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
> >> >> >> exclusive lock handling process will automatically take care of
> >> >> >> re-acquiring the lock from the dead client and resuming IO.
> >> >> >>
> >> >> >> This option not only re-uses existing code, it would also eliminate
> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
> >> >> >> snapshot creation to support group snapshots (since it could all be
> >> >> >> handled internally).
> >> >> >>
> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@xxxxxxxxxxxx>
> >> >> >> wrote:
> >> >> >>> Gentlemen,
> >> >> >>>
> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
> >> >> >>>
> >> >> >>> Here is the situation. In order to take snapshots of all images in a
> >> >> >>> consistency group,
> >> >> >>> we first need to quiesce all the image writers in the consistency
> >> >> >>> group.
> >> >> >>> Let me call
> >> >> >>> group client - a client which requests a consistency group to take a
> >> >> >>> snapshot.
> >> >> >>> Image client - the client that writes to an image.
> >> >> >>> Let's say group client starts sending notify_quiesce to all image
> >> >> >>> clients that write to the images in the group. After quiescing half of
> >> >> >>> the image clients the group client can die.
> >> >> >>>
> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
> >> >> >>> image clients.
> >> >> >>>
> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
> >> >> >>> consistency group.
> >> >> >>> We can show warning next to those unfinished groups when user runs
> >> >> >>> group list command.
> >> >> >>> There will be a command like group recover, which allows users to
> >> >> >>> rollback unsuccessful snapshots
> >> >> >>> or continue them using create snapshot command.
> >> >> >>>
> >> >> >>> Option 2 - is to establish some heart beats between group client and
> >> >> >>> image client. If group client fails to heart beat then image client
> >> >> >>> unquiesces itself and continues normal operation.
> >> >> >>>
> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
> >> >> >>> fails to make a group snapshot within this timeout then we resume our
> >> >> >>> normal operation informing group client of the fact.
> >> >> >>>
> >> >> >>> Which of these options do you prefer? Probably there are other options
> >> >> >>> that I miss.
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Victor.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Jason
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Jason
> >> >
> >> >
> >> >
> >> > --
> >> > Jason
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html