Hi Florian,
On 15/11/2019 12:32, Florian Haas wrote:
I received this off-list but then subsequently saw this message pop up
in the list archive, so I hope it's OK to reply on-list?
Of course, I just clicked the wrong reply button the first time.
So that cap was indeed missing, thanks for the hint! However, I am still
trying to understand how this is related to the issue we saw.
I had exactly the same happen to me as happened to you a week or so ago.
Compute node lost power and once restored the VMs would start booting
but fail early on when they tried to write.
My key was also missing that cap, adding it and resetting the affected
VMs was the only action I took to sort things out. I didn't need to go
around removing locks by hand as you did. As you say, waiting 30 seconds
didn't do any good so it doesn't appear to be a watcher thing.
This was mentioned in the release notes for Luminous[1], I'd missed it
too as I redeployed Nautilus instead and skipped these steps:
<snip>
Verify that all RBD client users have sufficient caps to blacklist other
client users. RBD client users with only "allow r" monitor caps should
be updated as follows:
# ceph auth caps client.<ID> mon 'allow r, allow command "osd
blacklist"' osd '<existing OSD caps for user>'
<snip>
Simon
[1]
https://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com