Hi Ilya, thanks a lot for the information. Yes, I was talking about the exclusive lock feature and was under the impression that only one rbd client can get write access on connect and will keep it until disconnect. The problem we are facing with multi-VM write access is, that this will inevitably corrupt the file system created on the rbd if two instances can get write access. Its not a shared file system, its just an xfs formatted virtual disk. > There is a way to disable automatic lock transitions but I don't think > it's wired up in QEMU. Can you point me to some documentation about that? It sounds like this is what would be needed to avoid the file system corruption in our use case. The lock transition should be initiated from the outside and the lock should then stay fixed on the client holding it until it is instructed to give up the lock or it disconnects. >> Is this a known problem with libceph and libvirtd? > Not sure what you mean by libceph. I simply meant that its not a krbd client. Libvirt uses libceph (or was it librbd?) to emulate virtual drives, not krbd. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Ilya Dryomov <idryomov@xxxxxxxxx> Sent: 18 January 2023 14:26:54 To: Frank Schilder Cc: ceph-users@xxxxxxx Subject: Re: Ceph rbd clients surrender exclusive lock in critical situation On Wed, Jan 18, 2023 at 1:19 PM Frank Schilder <frans@xxxxxx> wrote: > > Hi all, > > we are observing a problem on a libvirt virtualisation cluster that might come from ceph rbd clients. Something went wrong during execution of a live-migration operation and as a result we have two instances of the same VM running on 2 different hosts, the source- and the destination host. What we observe now is the the exclusive lock of the RBD disk image moves between these two clients periodically (every few minutes the owner flips). Hi Frank, If you are talking about RBD exclusive lock feature ("exclusive-lock" under "features" in "rbd info" output) then this is expected. This feature provides automatic cooperative lock transitions between clients to ensure that only a single client is writing to the image at any given time. It's there to protect internal per-image data structures such as the object map, the journal or the client-side PWL (persistent write log) cache from concurrent modifications in case the image is opened by two or more clients. The name is confusing but it's NOT about preventing other clients from opening and writing to the image. Rather it's about serializing those writes. > > We are pretty sure that no virsh commands possibly having that effect are executed during this time. The client connections are not lost and the OSD blacklist is empty. I don't understand why a ceph rbd client would surrender an exclusive lock in such a split brain situation, its exactly when it needs to hold on to it. As a result, the affected virtual drives are corrupted. There is no split-brain from the Ceph POV here. RBD has always supported the multiple clients use case. > > The questions we have in this context are: > > Under what conditions does a ceph rbd client surrender an exclusive lock? Exclusive lock transitions are cooperative so any time another client asks for it (not immediately though -- the current lock owner finishes processing in-flight I/O and flushes its caches first). > Could this be a bug in the client or a ceph config error? Very unlikely. There is a way to disable automatic lock transitions but I don't think it's wired up in QEMU. > Is this a known problem with libceph and libvirtd? Not sure what you mean by libceph. Thanks, Ilya _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx