Hello Jason, as it still happens and VMs are crashing. I wanted to disable exclusive-lock,fast-diff again. But i detected that there are images where the rbd commands runs in an endless loop. I canceled the command after 60s and used --debug-rbd=20. Will send the log off list. Thanks! Greets, Stefan Am 13.05.2017 um 19:19 schrieb Stefan Priebe - Profihost AG: > Hello Jason, > > it seems to be related to fstrim and discard. I cannot reproduce it for > images were we don't use trim - but it's still the case it's working > fine for images created with jewel and it is not for images pre jewel. > The only difference i can find is that the images created with jewel > also support deep-flatten. > > Greets, > Stefan > > Am 11.05.2017 um 22:28 schrieb Jason Dillaman: >> Assuming the only log messages you are seeing are the following: >> >> 2017-05-06 03:20:50.830626 7f7876a64700 -1 >> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >> object map in-memory >> 2017-05-06 03:20:50.830634 7f7876a64700 -1 >> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >> object map on-disk >> 2017-05-06 03:20:50.831250 7f7877265700 -1 >> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 >> >> It looks like that can only occur if somehow the object-map on disk is >> larger than the actual image size. If that's the case, how the image >> got into that state is unknown to me at this point. >> >> On Thu, May 11, 2017 at 3:23 PM, Stefan Priebe - Profihost AG >> <s.priebe@xxxxxxxxxxxx> wrote: >>> Hi Jason, >>> >>> it seems i can at least circumvent the crashes. Since i restarted ALL >>> osds after enabling exclusive lock and rebuilding the object maps it had >>> no new crashes. >>> >>> What still makes me wonder are those >>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 >>> >>> messages. >>> >>> Greets, >>> Stefan >>> >>> Am 08.05.2017 um 14:50 schrieb Stefan Priebe - Profihost AG: >>>> Hi, >>>> Am 08.05.2017 um 14:40 schrieb Jason Dillaman: >>>>> You are saying that you had v2 RBD images created against Hammer OSDs >>>>> and client libraries where exclusive lock, object map, etc were never >>>>> enabled. You then upgraded the OSDs and clients to Jewel and at some >>>>> point enabled exclusive lock (and I'd assume object map) on these >>>>> images >>>> >>>> Yes i did: >>>> for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }'); >>>> do rbd -p cephstor5 feature enable $img >>>> exclusive-lock,object-map,fast-diff || echo $img; done >>>> >>>>> -- or were the exclusive lock and object map features already >>>>> enabled under Hammer? >>>> >>>> No as they were not the rbd defaults. >>>> >>>>> The fact that you encountered an object map error on an export >>>>> operation is surprising to me. Does that error re-occur if you >>>>> perform the export again? If you can repeat it, it would be very >>>>> helpful if you could run the export with "--debug-rbd=20" and capture >>>>> the generated logs. >>>> >>>> No i can't repeat it. It happens every night but for different images. >>>> But i never saw it for a vm twice. If i do he export again it works fine. >>>> >>>> I'm doing an rbd export or an rbd export-diff --from-snap it depends on >>>> the VM and day since the last snapshot. >>>> >>>> Greets, >>>> Stefan >>>> >>>>> >>>>> On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG >>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>> Hi, >>>>>> >>>>>> also i'm getting these errors only for pre jewel images: >>>>>> >>>>>> 2017-05-06 03:20:50.830626 7f7876a64700 -1 >>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >>>>>> object map in-memory >>>>>> 2017-05-06 03:20:50.830634 7f7876a64700 -1 >>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >>>>>> object map on-disk >>>>>> 2017-05-06 03:20:50.831250 7f7877265700 -1 >>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 >>>>>> >>>>>> while running export-diff. >>>>>> >>>>>> Stefan >>>>>> >>>>>> Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG: >>>>>>> Hello Json, >>>>>>> >>>>>>> while doing further testing it happens only with images created with >>>>>>> hammer and that got upgraded to jewel AND got enabled exclusive lock. >>>>>>> >>>>>>> Greets, >>>>>>> Stefan >>>>>>> >>>>>>> Am 04.05.2017 um 14:20 schrieb Jason Dillaman: >>>>>>>> Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the >>>>>>>> command and post the resulting log to a new ticket at [1]? I'd also be >>>>>>>> interested if you could re-create that >>>>>>>> "librbd::object_map::InvalidateRequest" issue repeatably. >>>>>>>> n >>>>>>>> [1] http://tracker.ceph.com/projects/rbd/issues >>>>>>>> >>>>>>>> On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG >>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>>> Example: >>>>>>>>> # rbd rm cephstor2/vm-136-disk-1 >>>>>>>>> Removing image: 99% complete... >>>>>>>>> >>>>>>>>> Stuck at 99% and never completes. This is an image which got corrupted >>>>>>>>> for an unknown reason. >>>>>>>>> >>>>>>>>> Greets, >>>>>>>>> Stefan >>>>>>>>> >>>>>>>>> Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>> I'm not sure whether this is related but our backup system uses rbd >>>>>>>>>> snapshots and reports sometimes messages like these: >>>>>>>>>> 2017-05-04 02:42:47.661263 7f3316ffd700 -1 >>>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: r=0 >>>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> since we've upgraded from hammer to jewel 10.2.7 and enabled >>>>>>>>>>> exclusive-lock,object-map,fast-diff we've problems with corrupting VM >>>>>>>>>>> filesystems. >>>>>>>>>>> >>>>>>>>>>> Sometimes the VMs are just crashing with FS errors and a restart can >>>>>>>>>>> solve the problem. Sometimes the whole VM is not even bootable and we >>>>>>>>>>> need to import a backup. >>>>>>>>>>> >>>>>>>>>>> All of them have the same problem that you can't revert to an older >>>>>>>>>>> snapshot. The rbd command just hangs at 99% forever. >>>>>>>>>>> >>>>>>>>>>> Is this a known issue - anythink we can check? >>>>>>>>>>> >>>>>>>>>>> Greets, >>>>>>>>>>> Stefan >>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> ceph-users mailing list >>>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>>> >>>>> >> >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com