I verified it. After a live migration of the VM i'm able to successfully disable fast-diff,exclusive-lock,object-map. The problem only seems to occur at all if a client has connected to hammer without exclusive lock. Than got upgraded to jewel and exclusive lock gets enabled. Greets, Stefan Am 14.05.2017 um 19:33 schrieb Stefan Priebe - Profihost AG: > Hello Jason, > > Am 14.05.2017 um 14:04 schrieb Jason Dillaman: >> It appears as though there is client.27994090 at 10.255.0.13 that >> currently owns the exclusive lock on that image. I am assuming the log >> is from "rbd feature disable"? > Yes. > >> If so, I can see that it attempts to >> acquire the lock and the other side is not appropriately responding to >> the request. >> >> Assuming your system is still in this state, is there any chance to >> get debug rbd=20 logs from that client by using the client's asok file >> and "ceph --admin-daemon /path/to/client/asok config set debug_rbd 20" >> and re-run the attempt to disable exclusive lock? > > It's a VM running qemu with librbd. It seems there is no default socket. > If there is no way to activate it later - i don't think so. I can try to > activate it in ceph.conf and migrate it to another node. But i'm not > sure whether the problem persist after migration or if librbd is > somewhat like reinitialized. > >> Also, what version of Ceph is that client running? > Client and Server are on ceph 10.2.7. > > Greets, > Stefan > >> Jason >> >> On Sun, May 14, 2017 at 1:55 AM, Stefan Priebe - Profihost AG >> <s.priebe@xxxxxxxxxxxx> wrote: >>> Hello Jason, >>> >>> as it still happens and VMs are crashing. I wanted to disable >>> exclusive-lock,fast-diff again. But i detected that there are images >>> where the rbd commands runs in an endless loop. >>> >>> I canceled the command after 60s and used --debug-rbd=20. Will send the >>> log off list. >>> >>> Thanks! >>> >>> Greets, >>> Stefan >>> >>> Am 13.05.2017 um 19:19 schrieb Stefan Priebe - Profihost AG: >>>> Hello Jason, >>>> >>>> it seems to be related to fstrim and discard. I cannot reproduce it for >>>> images were we don't use trim - but it's still the case it's working >>>> fine for images created with jewel and it is not for images pre jewel. >>>> The only difference i can find is that the images created with jewel >>>> also support deep-flatten. >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 11.05.2017 um 22:28 schrieb Jason Dillaman: >>>>> Assuming the only log messages you are seeing are the following: >>>>> >>>>> 2017-05-06 03:20:50.830626 7f7876a64700 -1 >>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >>>>> object map in-memory >>>>> 2017-05-06 03:20:50.830634 7f7876a64700 -1 >>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >>>>> object map on-disk >>>>> 2017-05-06 03:20:50.831250 7f7877265700 -1 >>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 >>>>> >>>>> It looks like that can only occur if somehow the object-map on disk is >>>>> larger than the actual image size. If that's the case, how the image >>>>> got into that state is unknown to me at this point. >>>>> >>>>> On Thu, May 11, 2017 at 3:23 PM, Stefan Priebe - Profihost AG >>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>> Hi Jason, >>>>>> >>>>>> it seems i can at least circumvent the crashes. Since i restarted ALL >>>>>> osds after enabling exclusive lock and rebuilding the object maps it had >>>>>> no new crashes. >>>>>> >>>>>> What still makes me wonder are those >>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 >>>>>> >>>>>> messages. >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> Am 08.05.2017 um 14:50 schrieb Stefan Priebe - Profihost AG: >>>>>>> Hi, >>>>>>> Am 08.05.2017 um 14:40 schrieb Jason Dillaman: >>>>>>>> You are saying that you had v2 RBD images created against Hammer OSDs >>>>>>>> and client libraries where exclusive lock, object map, etc were never >>>>>>>> enabled. You then upgraded the OSDs and clients to Jewel and at some >>>>>>>> point enabled exclusive lock (and I'd assume object map) on these >>>>>>>> images >>>>>>> >>>>>>> Yes i did: >>>>>>> for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }'); >>>>>>> do rbd -p cephstor5 feature enable $img >>>>>>> exclusive-lock,object-map,fast-diff || echo $img; done >>>>>>> >>>>>>>> -- or were the exclusive lock and object map features already >>>>>>>> enabled under Hammer? >>>>>>> >>>>>>> No as they were not the rbd defaults. >>>>>>> >>>>>>>> The fact that you encountered an object map error on an export >>>>>>>> operation is surprising to me. Does that error re-occur if you >>>>>>>> perform the export again? If you can repeat it, it would be very >>>>>>>> helpful if you could run the export with "--debug-rbd=20" and capture >>>>>>>> the generated logs. >>>>>>> >>>>>>> No i can't repeat it. It happens every night but for different images. >>>>>>> But i never saw it for a vm twice. If i do he export again it works fine. >>>>>>> >>>>>>> I'm doing an rbd export or an rbd export-diff --from-snap it depends on >>>>>>> the VM and day since the last snapshot. >>>>>>> >>>>>>> Greets, >>>>>>> Stefan >>>>>>> >>>>>>>> >>>>>>>> On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG >>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> also i'm getting these errors only for pre jewel images: >>>>>>>>> >>>>>>>>> 2017-05-06 03:20:50.830626 7f7876a64700 -1 >>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >>>>>>>>> object map in-memory >>>>>>>>> 2017-05-06 03:20:50.830634 7f7876a64700 -1 >>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >>>>>>>>> object map on-disk >>>>>>>>> 2017-05-06 03:20:50.831250 7f7877265700 -1 >>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 >>>>>>>>> >>>>>>>>> while running export-diff. >>>>>>>>> >>>>>>>>> Stefan >>>>>>>>> >>>>>>>>> Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>> Hello Json, >>>>>>>>>> >>>>>>>>>> while doing further testing it happens only with images created with >>>>>>>>>> hammer and that got upgraded to jewel AND got enabled exclusive lock. >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 04.05.2017 um 14:20 schrieb Jason Dillaman: >>>>>>>>>>> Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the >>>>>>>>>>> command and post the resulting log to a new ticket at [1]? I'd also be >>>>>>>>>>> interested if you could re-create that >>>>>>>>>>> "librbd::object_map::InvalidateRequest" issue repeatably. >>>>>>>>>>> n >>>>>>>>>>> [1] http://tracker.ceph.com/projects/rbd/issues >>>>>>>>>>> >>>>>>>>>>> On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG >>>>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>>>>>> Example: >>>>>>>>>>>> # rbd rm cephstor2/vm-136-disk-1 >>>>>>>>>>>> Removing image: 99% complete... >>>>>>>>>>>> >>>>>>>>>>>> Stuck at 99% and never completes. This is an image which got corrupted >>>>>>>>>>>> for an unknown reason. >>>>>>>>>>>> >>>>>>>>>>>> Greets, >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>> Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>> I'm not sure whether this is related but our backup system uses rbd >>>>>>>>>>>>> snapshots and reports sometimes messages like these: >>>>>>>>>>>>> 2017-05-04 02:42:47.661263 7f3316ffd700 -1 >>>>>>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: r=0 >>>>>>>>>>>>> >>>>>>>>>>>>> Stefan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> since we've upgraded from hammer to jewel 10.2.7 and enabled >>>>>>>>>>>>>> exclusive-lock,object-map,fast-diff we've problems with corrupting VM >>>>>>>>>>>>>> filesystems. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sometimes the VMs are just crashing with FS errors and a restart can >>>>>>>>>>>>>> solve the problem. Sometimes the whole VM is not even bootable and we >>>>>>>>>>>>>> need to import a backup. >>>>>>>>>>>>>> >>>>>>>>>>>>>> All of them have the same problem that you can't revert to an older >>>>>>>>>>>>>> snapshot. The rbd command just hangs at 99% forever. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is this a known issue - anythink we can check? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>>> >>>>> >> >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com