Re: Snapshots of consistency groups

Mykola Golub <mgolub@xxxxxxxxxxxx> · Fri, 19 Aug 2016 15:48:14 +0300

On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
> Could you please point me to the place in source code where writer
> acquires an exclusive lock on the image.

Grep for 'exclusive_lock->request_lock'. Basically, what you need
(after opening the image) is:

```
  C_SaferCond lock_ctx;
  {
    RWLock::WLocker l(ictx->owner_lock);

    if (ictx->exclusive_lock == nullptr) {
      // exclusive-lock feature is not enabled
      return -EINVAL;
    }

    // Request the lock. If it is currently owned by another client,
    // RPC message will be sent to the client to release the lock.
    ictx->exclusive_lock->request_lock(&lock_ctx);
  } // release owner_lock before waiting to avoid potential deadlock

  int r = lock_ctx.wait();
  if (r < 0) {
    return r;
  }

  RWLock::RLocker l(ictx->owner_lock);
  if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
       // failed to acquire exclusive lock
       return -EROFS;
  }

  // At this point lock is acquired
  ...

```

You might want to look at this PR

 https://github.com/ceph/ceph/pull/9592

where we discuss adding API methods to directly acquire and release
the exclusive lock. You don't need the API, but will find examples in
the patch, and also useful comments from Jason.

-- 
Mykola Golub

> I presume you were talking about the feature:
> exclusive_lock, shared_lock which can be used from command line using
> commands lock list, lock break.
> 
> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > There is already a "request lock" RPC message and this is already handled
> > transparently within librbd when you attempt to acquire the lock and another
> > client owns it.
> >
> >
> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@xxxxxxxxxxxx> wrote:
> >>
> >> If an image already has a writer who owns the lock,
> >> should I implement a notification that allows to ask the writer to
> >> release the lock,
> >> is there already a standard way to intercept the exclusive lock?
> >>
> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@xxxxxxxxxx>
> >> wrote:
> >> > ... one more thing:
> >> >
> >> > I was also thinking that we need a new RBD feature bit to be used to
> >> > indicate that an image is part of a consistency group to prevent older
> >> > librbd clients from removing the image or group snapshots.  This could
> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
> >> > still open the image R/O while its part of a group.
> >> >
> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@xxxxxxxxxx>
> >> > wrote:
> >> >> Way back in April when we had the CDM, I was originally thinking we
> >> >> should implement option 3. Essentially, you have a prepare group
> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
> >> >> When that lease expires, IO would automatically be resumed even if the
> >> >> group snapshot hasn't been created yet.  This would also require
> >> >> commit/abort group snapshot RPC messages.
> >> >>
> >> >> However, thinking about this last night, here is another potential
> >> >> option:
> >> >>
> >> >> Option 4 - require images to have the exclusive lock feature before
> >> >> they can be added to a consistency group (and prevent disabling of
> >> >> exclusive-lock while they are part of a group). Then librbd, via the
> >> >> rbd CLI (or client application of the rbd consistency group snap
> >> >> create API), can co-operatively acquire the lock from all active image
> >> >> clients within the group (i.e. all IO has been flushed and paused) and
> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
> >> >> exclusive lock handling process will automatically take care of
> >> >> re-acquiring the lock from the dead client and resuming IO.
> >> >>
> >> >> This option not only re-uses existing code, it would also eliminate
> >> >> the need to add/update the RPC messages for prepare/commit/abort
> >> >> snapshot creation to support group snapshots (since it could all be
> >> >> handled internally).
> >> >>
> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@xxxxxxxxxxxx>
> >> >> wrote:
> >> >>> Gentlemen,
> >> >>>
> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
> >> >>>
> >> >>> Here is the situation. In order to take snapshots of all images in a
> >> >>> consistency group,
> >> >>> we first need to quiesce all the image writers in the consistency
> >> >>> group.
> >> >>> Let me call
> >> >>> group client - a client which requests a consistency group to take a
> >> >>> snapshot.
> >> >>> Image client - the client that writes to an image.
> >> >>> Let's say group client starts sending notify_quiesce to all image
> >> >>> clients that write to the images in the group. After quiescing half of
> >> >>> the image clients the group client can die.
> >> >>>
> >> >>> It presents us with a dilemma - what should we do with those quiesced
> >> >>> image clients.
> >> >>>
> >> >>> Option 1 - is to wait till someone manually runs recover for that
> >> >>> consistency group.
> >> >>> We can show warning next to those unfinished groups when user runs
> >> >>> group list command.
> >> >>> There will be a command like group recover, which allows users to
> >> >>> rollback unsuccessful snapshots
> >> >>> or continue them using create snapshot command.
> >> >>>
> >> >>> Option 2 - is to establish some heart beats between group client and
> >> >>> image client. If group client fails to heart beat then image client
> >> >>> unquiesces itself and continues normal operation.
> >> >>>
> >> >>> Option 3 - is to have a timeout for each image client. If group client
> >> >>> fails to make a group snapshot within this timeout then we resume our
> >> >>> normal operation informing group client of the fact.
> >> >>>
> >> >>> Which of these options do you prefer? Probably there are other options
> >> >>> that I miss.
> >> >>>
> >> >>> Thanks,
> >> >>> Victor.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jason
> >> >
> >> >
> >> >
> >> > --
> >> > Jason
> >
> >
> >
> > --
> > Jason
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html