Hi, we run into the same issue and there is actually another use case: live-migration of VMs. This requires an RBD image being mapped to two clients simultaneously, so this is intentional. If multiple clints map an image in RW-mode, the ceph back-end will cycle the write lock between the clients to allow each of them to flush writes, this is intentional. The way to coordinate here is the job of the orchestrator. In this case specifically, its explicitly managing a write lock during live-migration such that writes occur in the correct order. Its not a ceph job, its an orchestration job. The rbd interface just provides the tools to do it, for example, you can attach information that helps you hunting down dead-looking clients and kill them proper before mapping an image somewhere else. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Ilya Dryomov <idryomov@xxxxxxxxx> Sent: Thursday, May 23, 2024 2:05 PM To: Yuma Ogami Cc: ceph-users@xxxxxxx Subject: Re: does the RBD client block write when the Watcher times out? On Thu, May 23, 2024 at 4:48 AM Yuma Ogami <yuma.ogami.cybozu@xxxxxxxxx> wrote: > > Hello. > > I'm currently verifying the behavior of RBD on failure. I'm wondering > about the consistency of RBD images after network failures. As a > result of my investigation, I found that RBD sets a Watcher to RBD > image if a client mounts this volume to prevent multiple mounts. In Hi Yuma, The watcher is created to watch for updates (technically, to listen to notifications) on the RBD image, not to prevent multiple mounts. RBD allows the same image to be mapped multiple times on the same node or on different nodes. > addition, I found that if the client is isolated from the network for > a long time, the Watcher is released. However, the client still mounts > this image. In this situation, if another client can also mount this > image and the image is writable from both clients, data corruption > occurs. Could you tell me whether this is a realistic scenario? Yes, this is a realistic scenario which can occur even if the client isn't isolated from the network. If the user does this, it's up to the user to ensure that everything remains consistent. One use case for mapping the same image on multiple nodes is a clustered (also referred to as a shared disk) filesystem, such as OCFS2. Thanks, Ilya _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx