Re: does the RBD client block write when the Watcher times out?

Frank Schilder <frans@xxxxxx> · Thu, 23 May 2024 12:15:17 +0000

Hi, we run into the same issue and there is actually another use case: live-migration of VMs. This requires an RBD image being mapped to two clients simultaneously, so this is intentional. If multiple clints map an image in RW-mode, the ceph back-end will cycle the write lock between the clients to allow each of them to flush writes, this is intentional. The way to coordinate here is the job of the orchestrator. In this case specifically, its explicitly managing a write lock during live-migration such that writes occur in the correct order.

Its not a ceph job, its an orchestration job. The rbd interface just provides the tools to do it, for example, you can attach information that helps you hunting down dead-looking clients and kill them proper before mapping an image somewhere else.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Ilya Dryomov <idryomov@xxxxxxxxx>
Sent: Thursday, May 23, 2024 2:05 PM
To: Yuma Ogami
Cc: ceph-users@xxxxxxx
Subject:  Re: does the RBD client block write when the Watcher times out?

On Thu, May 23, 2024 at 4:48 AM Yuma Ogami <yuma.ogami.cybozu@xxxxxxxxx> wrote:
>
> Hello.
>
> I'm currently verifying the behavior of RBD on failure. I'm wondering
> about the consistency of RBD images after network failures. As a
> result of my investigation, I found that RBD sets a Watcher to RBD
> image if a client mounts this volume to prevent multiple mounts. In

Hi Yuma,

The watcher is created to watch for updates (technically, to listen to
notifications) on the RBD image, not to prevent multiple mounts.  RBD
allows the same image to be mapped multiple times on the same node or
on different nodes.

> addition, I found that if the client is isolated from the network for
> a long time, the Watcher is released. However, the client still mounts
> this image. In this situation, if another client can also mount this
> image and the image is writable from both clients, data corruption
> occurs. Could you tell me whether this is a realistic scenario?

Yes, this is a realistic scenario which can occur even if the client
isn't isolated from the network.  If the user does this, it's up to the
user to ensure that everything remains consistent.  One use case for
mapping the same image on multiple nodes is a clustered (also referred
to as a shared disk) filesystem, such as OCFS2.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx