Definitely would like to see the "debug rbd = 20" logs from 192.168.254.17 when this occurs. If you are co-locating your OSDs, MONs, and qemu-kvm processes, make sure your ceph.conf has "log file = </path/to/client.log>" defined in the [global] or [client] section. -- Jason Dillaman ----- Original Message ----- > From: "Василий Ангапов" <angapov@xxxxxxxxx> > To: "Jason Dillaman" <dillaman@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Sent: Wednesday, January 13, 2016 4:22:02 AM > Subject: Re: How to do quiesced rbd snapshot in libvirt? > > Hello again! > > Unfortunately I have to raise the problem again. I have constantly > hanging snapshots on several images. > My Ceph version is now 0.94.5. > RBD CLI always giving me this: > root@slpeah001:[~]:# rbd snap create > volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd --snap test > 2016-01-13 12:04:39.107166 7fb70e4c2880 -1 librbd::ImageWatcher: > 0x427a710 no lock owners detected > 2016-01-13 12:04:44.108783 7fb70e4c2880 -1 librbd::ImageWatcher: > 0x427a710 no lock owners detected > 2016-01-13 12:04:49.110321 7fb70e4c2880 -1 librbd::ImageWatcher: > 0x427a710 no lock owners detected > 2016-01-13 12:04:54.112373 7fb70e4c2880 -1 librbd::ImageWatcher: > 0x427a710 no lock owners detected > > I turned "debug rbd = 20" and found this records only on one of OSDs > (on the same host as RBD client): > 2016-01-13 11:44:46.076780 7fb5f05d8700 0 -- > 192.168.252.11:6804/407141 >> 192.168.252.11:6800/407122 > pipe(0x392d2000 sd=257 :6804 s=2 pgs=17 cs=1 l=0 c=0x383b4160).fault > with nothing to send, going to standby > 2016-01-13 11:58:26.261460 7fb5efbce700 0 -- > 192.168.252.11:6804/407141 >> 192.168.252.11:6802/407124 > pipe(0x39e45000 sd=156 :6804 s=2 pgs=17 cs=1 l=0 c=0x386fbb20).fault > with nothing to send, going to standby > 2016-01-13 12:04:23.948931 7fb5fede2700 0 -- > 192.168.254.11:6804/407141 submit_message watch-notify(notify_complete > (2) cookie 44850800 notify 99720550678667 ret -110) v3 remote, > 192.168.254.11:0/1468572, failed lossy con, dropping message > 0x3ab76fc0 > 2016-01-13 12:09:04.254329 7fb5fede2700 0 -- > 192.168.254.11:6804/407141 submit_message watch-notify(notify_complete > (2) cookie 69846112 notify 99720550678721 ret -110) v3 remote, > 192.168.254.11:0/1509673, failed lossy con, dropping message > 0x3830cb40 > > Here is the image properties > root@slpeah001:[~]:# rbd info > volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd > rbd image 'volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd': > size 200 GB in 51200 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.2f2a81562fea59 > format: 2 > features: layering, striping, exclusive, object map > flags: > stripe unit: 4096 kB > stripe count: 1 > root@slpeah001:[~]:# rbd status > volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd > Watchers: > watcher=192.168.254.17:0/2088291 client.3424561 cookie=93888518795008 > root@slpeah001:[~]:# rbd lock list > volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd > There is 1 exclusive lock on this image. > Locker ID Address > client.3424561 auto 93888518795008 192.168.254.17:0/2088291 > > Also taking RBD snapshots from python API also is hanging... > This image is being used by libvirt. > > Any suggestions? > Thanks! > > Regards, Vasily. > > > 2016-01-06 1:11 GMT+08:00 Мистер Сёма <angapov@xxxxxxxxx>: > > Well, I believe the problem is no more valid. > > My code before was: > > virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}' > > rbd snap create $RBD_ID --snap `date +%F-%T` > > > > and then snapshot creation was hanging forever. I inserted a 2 second > > sleep. > > > > My code after > > virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}' > > sleep 2 > > rbd snap create $RBD_ID --snap `date +%F-%T` > > > > And now it works perfectly. Again, I have no idea, how it solved the > > problem. > > Thanks :) > > > > 2016-01-06 0:49 GMT+08:00 Мистер Сёма <angapov@xxxxxxxxx>: > >> I am very sorry, but I am not able to increase log versbosity because > >> it's a production cluster with very limited space for logs. Sounds > >> crazy, but that's it. > >> I have found out that the RBD snapshot process hangs forever only when > >> QEMU fsfreeze was issued just before the snapshot. If the guest is not > >> frozen - snapshot is taken with no problem... I have absolutely no > >> idea how these two things could be related to each other... And again > >> this issue occurs only when there is an exclusive lock on image and > >> exclusive lock feature is enabled also on it. > >> > >> Do somebody else have such a problem? > >> > >> 2016-01-05 2:55 GMT+08:00 Jason Dillaman <dillaman@xxxxxxxxxx>: > >>> I am surprised by the error you are seeing with exclusive lock enabled. > >>> The rbd CLI should be able to send the 'snap create' request to QEMU > >>> without an error. Are you able to provide "debug rbd = 20" logs from > >>> shortly before and after your snapshot attempt? > >>> > >>> -- > >>> > >>> Jason Dillaman > >>> > >>> > >>> ----- Original Message ----- > >>>> From: "Мистер Сёма" <angapov@xxxxxxxxx> > >>>> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > >>>> Sent: Monday, January 4, 2016 12:37:07 PM > >>>> Subject: How to do quiesced rbd snapshot in libvirt? > >>>> > >>>> Hello, > >>>> > >>>> Can anyone please tell me what is the right way to do quiesced RBD > >>>> snapshots in libvirt (OpenStack)? > >>>> My Ceph version is 0.94.3. > >>>> > >>>> I found two possible ways, none of them is working for me. Wonder if > >>>> I'm doing something wrong: > >>>> 1) Do VM fsFreeze through QEMU guest agent, perform RBD snapshot, do > >>>> fsThaw. Looks good but the bad thing here is that libvirt uses > >>>> exclusive lock on image, which results in errors like that when taking > >>>> snapshot: " 7f359d304880 -1 librbd::ImageWatcher: no lock owners > >>>> detected". It seems like rbd client is trying to take snapshot on > >>>> behalf of exclusive lock owner but is unable to find this owner. > >>>> Without an exclusive lock everything is working nice. > >>>> > >>>> 2) Performing QEMU external snapshots with local QCOW2 file being > >>>> overlayed on top of RBD image. This seems really interesting but the > >>>> bad thing is that there is no way currently to remove this kind of > >>>> snapshot because active blockcommit is not currently working for RBD > >>>> images (https://bugzilla.redhat.com/show_bug.cgi?id=1189998). > >>>> > >>>> So again my question is: how do you guys take quiesced RBD snapshots in > >>>> libvirt? > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@xxxxxxxxxxxxxx > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com