Re: Ceph rbd clients surrender exclusive lock in critical situation

Ilya Dryomov <idryomov@xxxxxxxxx> · Fri, 27 Jan 2023 15:15:55 +0100

On Fri, Jan 27, 2023 at 11:21 AM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Mark,
>
> thanks a lot! This seems to address the issue we observe, at least to a large degree.
>
> I believe we had 2 VMs running after a failed live-migration as well and in this case it doesn't seem like it will help. Maybe its possible to add a bit of logic for this case as well (similar to fencing). My experience was that the write lock moves to the target VM and then there is a reasonable time interval before it is handed back. This might be a sufficient window of opportunity to kill hard a VM that should not run before it acquires the write log again.
>
> Thanks for that link! A script template like that could actually be added to the ceph documentation under rbd locks. It seems to be a really important and useful use case for image locking.

Hi Frank,

The script at [1] looks a bit suspicious to me because it uses shared
locking (--shared option) and checks whether the image is locked by
grepping "rbd lock list" output.  There is a bunch of VM states
("migrate", "prepare", etc) and a couple of different lock IDs are
employed ("migrate", "startup", "libvirt") so I could be wrong -- such
nasty state transitions may just not be possible in libvirt -- but
considered purely in isolation the following

    function lock {
        rbd=$1
        locktype=$2
        ...
        rbd lock add $rbd $locktype --shared libvirt
    }

    if is_locked $rbd libvirt
    then
        ...
        exit 257
    fi
    lock $rbd libvirt
    < presumably VM is allowed to start >

could easily allow to start two VMs on the same $rbd image if invoked
in parallel on two different nodes.

For now, I have just updated the documentation at [2] to highlight and
warn about the automatic lock transitions behavior.

[1] https://www.wogri.at/scripts/ceph-libvirt-locking/
[2] https://docs.ceph.com/en/quincy/rbd/rbd-exclusive-locks/

Thanks,

                Ilya

>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Marc <Marc@xxxxxxxxxxxxxxxxx>
> Sent: 26 January 2023 18:44:41
> To: Frank Schilder; 'ceph-users@xxxxxxx'
> Subject: RE:  Re: Ceph rbd clients surrender exclusive lock in critical situation
>
> > >
> > > Hi all,
> > >
> > > we are observing a problem on a libvirt virtualisation cluster that
> > might come from ceph rbd clients. Something went wrong during execution
> > of a live-migration operation and as a result we have two instances of
> > the same VM running on 2 different hosts, the source- and the
> > destination host. What we observe now is the the exclusive lock of the
> > RBD disk image moves between these two clients periodically (every few
> > minutes the owner flips).
> >
> > Hi Frank,
> >
> > If you are talking about RBD exclusive lock feature ("exclusive-lock"
> > under "features" in "rbd info" output) then this is expected.  This
> > feature provides automatic cooperative lock transitions between clients
> > to ensure that only a single client is writing to the image at any
> > given time.  It's there to protect internal per-image data structures
> > such as the object map, the journal or the client-side PWL (persistent
> > write log) cache from concurrent modifications in case the image is
> > opened by two or more clients.  The name is confusing but it's NOT
> > about preventing other clients from opening and writing to the image.
> > Rather it's about serializing those writes.
> >
>
>
> I can remember asking this also quite some time ago. Maybe this is helpful
>
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wogri.at%2Fscripts%2Fceph-libvirt-locking%2F&data=05%7C01%7Cfrans%40dtu.dk%7C031cb8149ea7428894d308daffc50359%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638103518897013524%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LlVnJoaoXdNeRskJqjrjb8BHSibZd1F8r%2FAMK0J1CWA%3D&reserved=0
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx