Re: Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

Florian Haas <florian@xxxxxxxxxxxxxx> · Fri, 15 Nov 2019 13:32:52 +0100

On 15/11/2019 11:23, Simon Ironside wrote:
> Hi Florian,
> 
> Any chance the key your compute nodes are using for the RBD pool is
> missing 'allow command "osd blacklist"' from its mon caps?
> 
> Simon

Hi Simon,

I received this off-list but then subsequently saw this message pop up
in the list archive, so I hope it's OK to reply on-list?

So that cap was indeed missing, thanks for the hint! However, I am still
trying to understand how this is related to the issue we saw.

The only documentation-ish article that I found about osd blacklist caps
is this:

https://access.redhat.com/solutions/3391211

We can also confirm a bunch of "access denied" messages when trying to
blacklist an OSD in the mon logs. So the content of that article
definitely applies to our situation, I'm just not sure I follow how the
absence of that capability caused this issue.

The article talks about RBD watchers, not locks. To the best of my
knowledge, a watcher operates like a lease on the image, which is
periodically renewed. If not renewed in 30 seconds of client inactivity,
the cluster considers the client dead. (Please correct me if I'm wrong.)
For us, that didn't help. We had to actively remove locks with "rbd lock
rm". Is the article using the wrong terms? Is there a link between
watchers and locks that I'm unaware of?

Semi-relatedly, as I understand it OSD blacklisting happens based either
on an IP address, or on a socket address (IP:port). While this comes in
handy in host evacuation, it doesn't in in-place recovery (see question
4 in my original message).

- If the blacklist happens based on IP address alone (and that's what
seems to be what the client attempts to be doing, based on our log
messages), then it would break recovery-in-place after a hard reboot
altogether.

- Even if the client would blacklist based on an address:port pair, it
would be just very unlikely that an RBD client used the same source port
to connect after the node recovers in place, but not impossible.

So I am wondering: is this incorrect documentation, or incorrect
behavior, or am I simply making dead-wrong assumptions?

Cheers,
Florian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com