We've seen this. Our environment isn't identical though, we use oVirt and connect to ceph (11.2.1) via cinder (9.2.1), but it's so very rare that we've never had any luck in pin pointing it and have a lot less VMs, <300. Regards, Logan ----- On Nov 29, 2017, at 7:48 AM, Wido den Hollander wido@xxxxxxxx wrote: | Hi, | | On a OpenStack environment I encountered a VM which went into R/O mode after a | RBD snapshot was created. | | Digging into this I found 10s (out of thousands) RBD images which DO have a | running VM, but do NOT have a watcher on the RBD image. | | For example: | | $ rbd status volumes/volume-79773f2e-1f40-4eca-b9f0-953fa8d83086 | | 'Watchers: none' | | The VM is however running since September 5th 2017 with Jewel 10.2.7 on the | client. | | In the meantime the cluster was already upgraded to 10.2.10 | | Looking further I also found a Compute node with 10.2.10 installed which also | has RBD images without watchers. | | Restarting or live migrating the VM to a different host resolves this issue. | | The internet is full of posts where RBD images still have Watchers when people | don't expect them, but in this case I'm expecting a watcher which isn't there. | | The main problem right now is that creating a snapshot potentially puts a VM in | Read-Only state because of the lack of notification. | | Has anybody seen this as well? | | Thanks, | | Wido | _______________________________________________ | ceph-users mailing list | ceph-users@xxxxxxxxxxxxxx | http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com