Re: [PATCH 00/11 V1] rbd journaling feature

Ilya Dryomov <idryomov@xxxxxxxxx> · Mon, 7 Jan 2019 13:24:14 +0100

On Mon, Jan 7, 2019 at 3:51 AM Dongsheng Yang
<dongsheng.yang@xxxxxxxxxxxx> wrote:
>
>
>
> On 12/06/2018 12:16 AM, Jason Dillaman wrote:
> > On Wed, Dec 5, 2018 at 5:17 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> >> On Wed, Dec 5, 2018 at 3:46 AM Dongsheng Yang
> >> <dongsheng.yang@xxxxxxxxxxxx> wrote:
> >>> Hi Ilya and Jason,
> >>>       Maybe there is another option, umh (user mod helper):
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/umh.h
> >>> We can provide a subcommand in userspace for journal replaying, and we
> >>> can check the journal in rbd map and reaquire exclusive-lock, if we
> >>> find there is uncommitted entry, we can call userspace helper by umh to
> >>> replay it.
> >> Yes, making an upcall from the kernel might be an option, but the
> >> problem is that this can happen deep in the I/O path.  I'm not sure
> >> it's safe wrt memory allocation deadlocks because the helper is ran
> >> out of a regular workqueue, etc.
> >>
> >> Another option might be to daemonize "rbd map" process.
> >>
> >> Or maybe attempting a minimal replay in the kernel and going read-only
> >> in case something is wrong is actually fine as a starting point...
> > I'd vote for daemonizing "rbd map" (or a similar small, purpose-built
> > tool). The local "rbd-mirror" daemon process doesn't currently open a
> > watch on local primary images and in the case of one-way mirroring,
> > you would potentially not even have an "rbd-mirror" daemon running
> > locally.
>
> Hi Ilya and Jason,
>      When I want to go to implement in this way, I found a case I don't
> have a good solution, how to notify the deamon to exit in rbd unmap?
>
> In my design, there are two new messages will be introduced:
>    NOTIFY_OP_JOURNAL_REPLAY_REQUEST      = 17,
>    NOTIFY_OP_JOURNAL_REPLAY_COMPLETE     = 18,
>
> and both of the two messages have a pair of (tag_tid, replay_id) to specify
> a unique journal_replay_request. That can make sure we are receiving the
> replay_complete what we are waiting after replay_request.
>
> But about the rbd unmap, if we use the watch-notify on this image, all
> of the rbd map daemons watching on this image will get this notification.
>
> Do you have any suggestion about this case?

Hi Dongsheng,

(I'm not answering your question, but rather just writing up my
thoughts on this.)

I've thought about this and I'm not sure I like the "do everything in
the daemon" approach.  krbd is self-contained and I consider this one
of its strengths -- even mapping and unmapping images doesn't strictly
require any installed packages or dependencies.

What relying on the daemon (whether it's a daemonized "rbd map" process
or something else) for replay really means is an external dependency in
the I/O path.  A daemon can be killed at any time, OOMed, etc.  This is
a non-starter for a block device...

It seems like attempting a minimal replay in the kernel is the best we
can do in the general case.  If it fails, we return an error (map case)
or go read-only (general case).  And of course we can do a full replay
at "rbd map" time if the CLI tool is used for mapping the image.

A daemon that we can ask to do a reply for us might be an option in the
future, but only as a supplement and not as a hard dependency.

Thanks,

                Ilya