Re: Ceph rbd clients surrender exclusive lock in critical situation

Frank Schilder <frans@xxxxxx> · Wed, 18 Jan 2023 14:25:43 +0000

Hi Ilya,

thanks a lot for the information. Yes, I was talking about the exclusive lock feature and was under the impression that only one rbd client can get write access on connect and will keep it until disconnect. The problem we are facing with multi-VM write access is, that this will inevitably corrupt the file system created on the rbd if two instances can get write access. Its not a shared file system, its just an xfs formatted virtual disk.

> There is a way to disable automatic lock transitions but I don't think
> it's wired up in QEMU.

Can you point me to some documentation about that? It sounds like this is what would be needed to avoid the file system corruption in our use case. The lock transition should be initiated from the outside and the lock should then stay fixed on the client holding it until it is instructed to give up the lock or it disconnects.

>> Is this a known problem with libceph and libvirtd?
> Not sure what you mean by libceph.

I simply meant that its not a krbd client. Libvirt uses libceph (or was it librbd?) to emulate virtual drives, not krbd.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Ilya Dryomov <idryomov@xxxxxxxxx>
Sent: 18 January 2023 14:26:54
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  Ceph rbd clients surrender exclusive lock in critical situation

On Wed, Jan 18, 2023 at 1:19 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi all,
>
> we are observing a problem on a libvirt virtualisation cluster that might come from ceph rbd clients. Something went wrong during execution of a live-migration operation and as a result we have two instances of the same VM running on 2 different hosts, the source- and the destination host. What we observe now is the the exclusive lock of the RBD disk image moves between these two clients periodically (every few minutes the owner flips).

Hi Frank,

If you are talking about RBD exclusive lock feature ("exclusive-lock"
under "features" in "rbd info" output) then this is expected.  This
feature provides automatic cooperative lock transitions between clients
to ensure that only a single client is writing to the image at any
given time.  It's there to protect internal per-image data structures
such as the object map, the journal or the client-side PWL (persistent
write log) cache from concurrent modifications in case the image is
opened by two or more clients.  The name is confusing but it's NOT
about preventing other clients from opening and writing to the image.
Rather it's about serializing those writes.

>
> We are pretty sure that no virsh commands possibly having that effect are executed during this time. The client connections are not lost and the OSD blacklist is empty. I don't understand why a ceph rbd client would surrender an exclusive lock in such a split brain situation, its exactly when it needs to hold on to it. As a result, the affected virtual drives are corrupted.

There is no split-brain from the Ceph POV here.  RBD has always
supported the multiple clients use case.

>
> The questions we have in this context are:
>
> Under what conditions does a ceph rbd client surrender an exclusive lock?

Exclusive lock transitions are cooperative so any time another client
asks for it (not immediately though -- the current lock owner finishes
processing in-flight I/O and flushes its caches first).

> Could this be a bug in the client or a ceph config error?

Very unlikely.

There is a way to disable automatic lock transitions but I don't think
it's wired up in QEMU.

> Is this a known problem with libceph and libvirtd?

Not sure what you mean by libceph.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx