Hello again! Unfortunately I have to raise the problem again. I have constantly hanging snapshots on several images. My Ceph version is now 0.94.5. RBD CLI always giving me this: root@slpeah001:[~]:# rbd snap create volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd --snap test 2016-01-13 12:04:39.107166 7fb70e4c2880 -1 librbd::ImageWatcher: 0x427a710 no lock owners detected 2016-01-13 12:04:44.108783 7fb70e4c2880 -1 librbd::ImageWatcher: 0x427a710 no lock owners detected 2016-01-13 12:04:49.110321 7fb70e4c2880 -1 librbd::ImageWatcher: 0x427a710 no lock owners detected 2016-01-13 12:04:54.112373 7fb70e4c2880 -1 librbd::ImageWatcher: 0x427a710 no lock owners detected I turned "debug rbd = 20" and found this records only on one of OSDs (on the same host as RBD client): 2016-01-13 11:44:46.076780 7fb5f05d8700 0 -- 192.168.252.11:6804/407141 >> 192.168.252.11:6800/407122 pipe(0x392d2000 sd=257 :6804 s=2 pgs=17 cs=1 l=0 c=0x383b4160).fault with nothing to send, going to standby 2016-01-13 11:58:26.261460 7fb5efbce700 0 -- 192.168.252.11:6804/407141 >> 192.168.252.11:6802/407124 pipe(0x39e45000 sd=156 :6804 s=2 pgs=17 cs=1 l=0 c=0x386fbb20).fault with nothing to send, going to standby 2016-01-13 12:04:23.948931 7fb5fede2700 0 -- 192.168.254.11:6804/407141 submit_message watch-notify(notify_complete (2) cookie 44850800 notify 99720550678667 ret -110) v3 remote, 192.168.254.11:0/1468572, failed lossy con, dropping message 0x3ab76fc0 2016-01-13 12:09:04.254329 7fb5fede2700 0 -- 192.168.254.11:6804/407141 submit_message watch-notify(notify_complete (2) cookie 69846112 notify 99720550678721 ret -110) v3 remote, 192.168.254.11:0/1509673, failed lossy con, dropping message 0x3830cb40 Here is the image properties root@slpeah001:[~]:# rbd info volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd rbd image 'volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd': size 200 GB in 51200 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.2f2a81562fea59 format: 2 features: layering, striping, exclusive, object map flags: stripe unit: 4096 kB stripe count: 1 root@slpeah001:[~]:# rbd status volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd Watchers: watcher=192.168.254.17:0/2088291 client.3424561 cookie=93888518795008 root@slpeah001:[~]:# rbd lock list volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd There is 1 exclusive lock on this image. Locker ID Address client.3424561 auto 93888518795008 192.168.254.17:0/2088291 Also taking RBD snapshots from python API also is hanging... This image is being used by libvirt. Any suggestions? Thanks! Regards, Vasily. 2016-01-06 1:11 GMT+08:00 Мистер Сёма <angapov@xxxxxxxxx>: > Well, I believe the problem is no more valid. > My code before was: > virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}' > rbd snap create $RBD_ID --snap `date +%F-%T` > > and then snapshot creation was hanging forever. I inserted a 2 second sleep. > > My code after > virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}' > sleep 2 > rbd snap create $RBD_ID --snap `date +%F-%T` > > And now it works perfectly. Again, I have no idea, how it solved the problem. > Thanks :) > > 2016-01-06 0:49 GMT+08:00 Мистер Сёма <angapov@xxxxxxxxx>: >> I am very sorry, but I am not able to increase log versbosity because >> it's a production cluster with very limited space for logs. Sounds >> crazy, but that's it. >> I have found out that the RBD snapshot process hangs forever only when >> QEMU fsfreeze was issued just before the snapshot. If the guest is not >> frozen - snapshot is taken with no problem... I have absolutely no >> idea how these two things could be related to each other... And again >> this issue occurs only when there is an exclusive lock on image and >> exclusive lock feature is enabled also on it. >> >> Do somebody else have such a problem? >> >> 2016-01-05 2:55 GMT+08:00 Jason Dillaman <dillaman@xxxxxxxxxx>: >>> I am surprised by the error you are seeing with exclusive lock enabled. The rbd CLI should be able to send the 'snap create' request to QEMU without an error. Are you able to provide "debug rbd = 20" logs from shortly before and after your snapshot attempt? >>> >>> -- >>> >>> Jason Dillaman >>> >>> >>> ----- Original Message ----- >>>> From: "Мистер Сёма" <angapov@xxxxxxxxx> >>>> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> >>>> Sent: Monday, January 4, 2016 12:37:07 PM >>>> Subject: How to do quiesced rbd snapshot in libvirt? >>>> >>>> Hello, >>>> >>>> Can anyone please tell me what is the right way to do quiesced RBD >>>> snapshots in libvirt (OpenStack)? >>>> My Ceph version is 0.94.3. >>>> >>>> I found two possible ways, none of them is working for me. Wonder if >>>> I'm doing something wrong: >>>> 1) Do VM fsFreeze through QEMU guest agent, perform RBD snapshot, do >>>> fsThaw. Looks good but the bad thing here is that libvirt uses >>>> exclusive lock on image, which results in errors like that when taking >>>> snapshot: " 7f359d304880 -1 librbd::ImageWatcher: no lock owners >>>> detected". It seems like rbd client is trying to take snapshot on >>>> behalf of exclusive lock owner but is unable to find this owner. >>>> Without an exclusive lock everything is working nice. >>>> >>>> 2) Performing QEMU external snapshots with local QCOW2 file being >>>> overlayed on top of RBD image. This seems really interesting but the >>>> bad thing is that there is no way currently to remove this kind of >>>> snapshot because active blockcommit is not currently working for RBD >>>> images (https://bugzilla.redhat.com/show_bug.cgi?id=1189998). >>>> >>>> So again my question is: how do you guys take quiesced RBD snapshots in >>>> libvirt? >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com