Hi Jason, it seems i can at least circumvent the crashes. Since i restarted ALL osds after enabling exclusive lock and rebuilding the object maps it had no new crashes. What still makes me wonder are those librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 messages. Greets, Stefan Am 08.05.2017 um 14:50 schrieb Stefan Priebe - Profihost AG: > Hi, > Am 08.05.2017 um 14:40 schrieb Jason Dillaman: >> You are saying that you had v2 RBD images created against Hammer OSDs >> and client libraries where exclusive lock, object map, etc were never >> enabled. You then upgraded the OSDs and clients to Jewel and at some >> point enabled exclusive lock (and I'd assume object map) on these >> images > > Yes i did: > for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }'); > do rbd -p cephstor5 feature enable $img > exclusive-lock,object-map,fast-diff || echo $img; done > >> -- or were the exclusive lock and object map features already >> enabled under Hammer? > > No as they were not the rbd defaults. > >> The fact that you encountered an object map error on an export >> operation is surprising to me. Does that error re-occur if you >> perform the export again? If you can repeat it, it would be very >> helpful if you could run the export with "--debug-rbd=20" and capture >> the generated logs. > > No i can't repeat it. It happens every night but for different images. > But i never saw it for a vm twice. If i do he export again it works fine. > > I'm doing an rbd export or an rbd export-diff --from-snap it depends on > the VM and day since the last snapshot. > > Greets, > Stefan > >> >> On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG >> <s.priebe@xxxxxxxxxxxx> wrote: >>> Hi, >>> >>> also i'm getting these errors only for pre jewel images: >>> >>> 2017-05-06 03:20:50.830626 7f7876a64700 -1 >>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >>> object map in-memory >>> 2017-05-06 03:20:50.830634 7f7876a64700 -1 >>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >>> object map on-disk >>> 2017-05-06 03:20:50.831250 7f7877265700 -1 >>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 >>> >>> while running export-diff. >>> >>> Stefan >>> >>> Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG: >>>> Hello Json, >>>> >>>> while doing further testing it happens only with images created with >>>> hammer and that got upgraded to jewel AND got enabled exclusive lock. >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 04.05.2017 um 14:20 schrieb Jason Dillaman: >>>>> Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the >>>>> command and post the resulting log to a new ticket at [1]? I'd also be >>>>> interested if you could re-create that >>>>> "librbd::object_map::InvalidateRequest" issue repeatably. >>>>> n >>>>> [1] http://tracker.ceph.com/projects/rbd/issues >>>>> >>>>> On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG >>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>> Example: >>>>>> # rbd rm cephstor2/vm-136-disk-1 >>>>>> Removing image: 99% complete... >>>>>> >>>>>> Stuck at 99% and never completes. This is an image which got corrupted >>>>>> for an unknown reason. >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG: >>>>>>> I'm not sure whether this is related but our backup system uses rbd >>>>>>> snapshots and reports sometimes messages like these: >>>>>>> 2017-05-04 02:42:47.661263 7f3316ffd700 -1 >>>>>>> librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: r=0 >>>>>>> >>>>>>> Stefan >>>>>>> >>>>>>> >>>>>>> Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG: >>>>>>>> Hello, >>>>>>>> >>>>>>>> since we've upgraded from hammer to jewel 10.2.7 and enabled >>>>>>>> exclusive-lock,object-map,fast-diff we've problems with corrupting VM >>>>>>>> filesystems. >>>>>>>> >>>>>>>> Sometimes the VMs are just crashing with FS errors and a restart can >>>>>>>> solve the problem. Sometimes the whole VM is not even bootable and we >>>>>>>> need to import a backup. >>>>>>>> >>>>>>>> All of them have the same problem that you can't revert to an older >>>>>>>> snapshot. The rbd command just hangs at 99% forever. >>>>>>>> >>>>>>>> Is this a known issue - anythink we can check? >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>>> >>>>> >> >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com