Re: Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

Florian Haas <florian@xxxxxxxxxxxxxx> · Tue, 19 Nov 2019 22:31:49 +0100

On 19/11/2019 22:19, Jason Dillaman wrote:
> On Tue, Nov 19, 2019 at 4:09 PM Florian Haas <florian@xxxxxxxxxxxxxx> wrote:
>>
>> On 19/11/2019 21:32, Jason Dillaman wrote:
>>>> What, exactly, is the "reasonably configured hypervisor" here, in other
>>>> words, what is it that grabs and releases this lock? It's evidently not
>>>> Nova that does this, but is it libvirt, or Qemu/KVM, and if so, what
>>>> magic in there makes this happen, and what "reasonable configuration"
>>>> influences this?
>>>
>>> librbd and krbd perform this logic when the exclusive-lock feature is
>>> enabled.
>>
>> Right. So the "reasonable configuration" applies to the features they
>> enable when they *create* an image, rather than what they do to the
>> image at runtime. Is that fair to say?
> 
> The exclusive-lock ownership is enforced at image use (i.e. when the
> feature is a property of the image, not specifically just during the
> action of enabling the property) -- so this implies "what they do to
> the image at runtime"

OK, gotcha.

>>> In this case, librbd sees that the previous lock owner is
>>> dead / missing, but before it can steal the lock (since librbd did not
>>> cleanly close the image), it needs to ensure it cannot come back from
>>> the dead to issue future writes against the RBD image by blacklisting
>>> it from the cluster.
>>
>> Thanks. I'm probably sounding dense here, sorry for that, but yes, this
>> makes perfect sense to me when I want to fence a whole node off —
>> however, how exactly does this work with VM recovery in place?
> 
> How would librbd / krbd know under what situation a VM was being
> "recovered"? Should librbd be expected to integrate w/ IPMI devices
> where the VM is being run or w/ Zabbix alert monitoring to know that
> this was a power failure so don't expect that the lock owner will come
> back up? The safe and generic thing for librbd / krbd to do in this
> situation is to just blacklist the old lock owner to ensure it cannot
> talk to the cluster. Obviously in the case of a physically failed
> node, that won't ever happen -- but I think we can all agree this is
> the sane recovery path that covers all bases.

Oh totally, I wasn't arguing it was a bad idea for it to do what it
does! I just got confused by the fact that our mon logs showed what
looked like a (failed) attempt to blacklist an entire client IP address.

> Yup, with the correct permissions librbd / rbd will be able to
> blacklist the lock owner, break the old lock, and acquire the lock
> themselves for R/W operations -- and the operator would not need to
> intervene.

Ack. Thanks!

Cheers,
Florian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com