Re: [PATCH 00/11 V1] rbd journaling feature

Dongsheng Yang <dongsheng.yang@xxxxxxxxxxxx> · Tue, 8 Jan 2019 09:28:26 +0800

On 01/07/2019 08:24 PM, Ilya Dryomov wrote:
On Mon, Jan 7, 2019 at 3:51 AM Dongsheng Yang
<dongsheng.yang@xxxxxxxxxxxx> wrote:

On 12/06/2018 12:16 AM, Jason Dillaman wrote:
On Wed, Dec 5, 2018 at 5:17 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
On Wed, Dec 5, 2018 at 3:46 AM Dongsheng Yang
<dongsheng.yang@xxxxxxxxxxxx> wrote:
Hi Ilya and Jason,
       Maybe there is another option, umh (user mod helper):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/umh.h
We can provide a subcommand in userspace for journal replaying, and we
can check the journal in rbd map and reaquire exclusive-lock, if we
find there is uncommitted entry, we can call userspace helper by umh to
replay it.
Yes, making an upcall from the kernel might be an option, but the
problem is that this can happen deep in the I/O path.  I'm not sure
it's safe wrt memory allocation deadlocks because the helper is ran
out of a regular workqueue, etc.

Another option might be to daemonize "rbd map" process.

Or maybe attempting a minimal replay in the kernel and going read-only
in case something is wrong is actually fine as a starting point...
I'd vote for daemonizing "rbd map" (or a similar small, purpose-built
tool). The local "rbd-mirror" daemon process doesn't currently open a
watch on local primary images and in the case of one-way mirroring,
you would potentially not even have an "rbd-mirror" daemon running
locally.
Hi Ilya and Jason,
      When I want to go to implement in this way, I found a case I don't
have a good solution, how to notify the deamon to exit in rbd unmap?

In my design, there are two new messages will be introduced:
    NOTIFY_OP_JOURNAL_REPLAY_REQUEST      = 17,
    NOTIFY_OP_JOURNAL_REPLAY_COMPLETE     = 18,

and both of the two messages have a pair of (tag_tid, replay_id) to specify
a unique journal_replay_request. That can make sure we are receiving the
replay_complete what we are waiting after replay_request.

But about the rbd unmap, if we use the watch-notify on this image, all
of the rbd map daemons watching on this image will get this notification.

Do you have any suggestion about this case?
Hi Dongsheng,

(I'm not answering your question, but rather just writing up my
thoughts on this.)

I've thought about this and I'm not sure I like the "do everything in
the daemon" approach.  krbd is self-contained and I consider this one
of its strengths -- even mapping and unmapping images doesn't strictly
require any installed packages or dependencies.

What relying on the daemon (whether it's a daemonized "rbd map" process
or something else) for replay really means is an external dependency in
the I/O path.  A daemon can be killed at any time, OOMed, etc.  This is
a non-starter for a block device...

It seems like attempting a minimal replay in the kernel is the best we
can do in the general case.  If it fails, we return an error (map case)
or go read-only (general case).  And of course we can do a full replay
at "rbd map" time if the CLI tool is used for mapping the image.

Hi Ilya,
    Thanx for your suggestion, This is the longest but the best solution
in my opinion. I am glad we can take this way, starting from a minimal
replay. And I am glad to implement the full replay step by step in future.

Thanx

A daemon that we can ask to do a reply for us might be an option in the
future, but only as a supplement and not as a hard dependency.

Thanks,

                 Ilya