Re: Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

Simon Ironside <sironside@xxxxxxxxxxxxx> · Fri, 15 Nov 2019 13:27:34 +0000

Hi Florian,

On 15/11/2019 12:32, Florian Haas wrote:

I received this off-list but then subsequently saw this message pop up
in the list archive, so I hope it's OK to reply on-list?

Of course, I just clicked the wrong reply button the first time.

So that cap was indeed missing, thanks for the hint! However, I am still
trying to understand how this is related to the issue we saw.

I had exactly the same happen to me as happened to you a week or so ago. 
Compute node lost power and once restored the VMs would start booting 
but fail early on when they tried to write.

My key was also missing that cap, adding it and resetting the affected 
VMs was the only action I took to sort things out. I didn't need to go 
around removing locks by hand as you did. As you say, waiting 30 seconds 
didn't do any good so it doesn't appear to be a watcher thing.

This was mentioned in the release notes for Luminous[1], I'd missed it 
too as I redeployed Nautilus instead and skipped these steps:

<snip>

Verify that all RBD client users have sufficient caps to blacklist other 
client users. RBD client users with only "allow r" monitor caps should 
be updated as follows:

# ceph auth caps client.<ID> mon 'allow r, allow command "osd 
blacklist"' osd '<existing OSD caps for user>'

<snip>

Simon

[1] 
https://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com