Re: corrupted rbd filesystems since jewel

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Thu, 11 May 2017 21:23:35 +0200

Hi Jason,

it seems i can at least circumvent the crashes. Since i restarted ALL
osds after enabling exclusive lock and rebuilding the object maps it had
no new crashes.

What still makes me wonder are those
librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0

messages.

Greets,
Stefan

Am 08.05.2017 um 14:50 schrieb Stefan Priebe - Profihost AG:
> Hi,
> Am 08.05.2017 um 14:40 schrieb Jason Dillaman:
>> You are saying that you had v2 RBD images created against Hammer OSDs
>> and client libraries where exclusive lock, object map, etc were never
>> enabled. You then upgraded the OSDs and clients to Jewel and at some
>> point enabled exclusive lock (and I'd assume object map) on these
>> images
> 
> Yes i did:
> for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }');
> do rbd -p cephstor5 feature enable $img
> exclusive-lock,object-map,fast-diff || echo $img; done
> 
>> -- or were the exclusive lock and object map features already
>> enabled under Hammer?
> 
> No as they were not the rbd defaults.
> 
>> The fact that you encountered an object map error on an export
>> operation is surprising to me.  Does that error re-occur if you
>> perform the export again? If you can repeat it, it would be very
>> helpful if you could run the export with "--debug-rbd=20" and capture
>> the generated logs.
> 
> No i can't repeat it. It happens every night but for different images.
> But i never saw it for a vm twice. If i do he export again it works fine.
> 
> I'm doing an rbd export or an rbd export-diff --from-snap it depends on
> the VM and day since the last snapshot.
> 
> Greets,
> Stefan
> 
>>
>> On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG
>> <s.priebe@xxxxxxxxxxxx> wrote:
>>> Hi,
>>>
>>> also i'm getting these errors only for pre jewel images:
>>>
>>> 2017-05-06 03:20:50.830626 7f7876a64700 -1
>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>>> object map in-memory
>>> 2017-05-06 03:20:50.830634 7f7876a64700 -1
>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>>> object map on-disk
>>> 2017-05-06 03:20:50.831250 7f7877265700 -1
>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0
>>>
>>> while running export-diff.
>>>
>>> Stefan
>>>
>>> Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG:
>>>> Hello Json,
>>>>
>>>> while doing further testing it happens only with images created with
>>>> hammer and that got upgraded to jewel AND got enabled exclusive lock.
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 04.05.2017 um 14:20 schrieb Jason Dillaman:
>>>>> Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the
>>>>> command and post the resulting log to a new ticket at [1]? I'd also be
>>>>> interested if you could re-create that
>>>>> "librbd::object_map::InvalidateRequest" issue repeatably.
>>>>> n
>>>>> [1] http://tracker.ceph.com/projects/rbd/issues
>>>>>
>>>>> On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG
>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>> Example:
>>>>>> # rbd rm cephstor2/vm-136-disk-1
>>>>>> Removing image: 99% complete...
>>>>>>
>>>>>> Stuck at 99% and never completes. This is an image which got corrupted
>>>>>> for an unknown reason.
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>> Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG:
>>>>>>> I'm not sure whether this is related but our backup system uses rbd
>>>>>>> snapshots and reports sometimes messages like these:
>>>>>>> 2017-05-04 02:42:47.661263 7f3316ffd700 -1
>>>>>>> librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: r=0
>>>>>>>
>>>>>>> Stefan
>>>>>>>
>>>>>>>
>>>>>>> Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> since we've upgraded from hammer to jewel 10.2.7 and enabled
>>>>>>>> exclusive-lock,object-map,fast-diff we've problems with corrupting VM
>>>>>>>> filesystems.
>>>>>>>>
>>>>>>>> Sometimes the VMs are just crashing with FS errors and a restart can
>>>>>>>> solve the problem. Sometimes the whole VM is not even bootable and we
>>>>>>>> need to import a backup.
>>>>>>>>
>>>>>>>> All of them have the same problem that you can't revert to an older
>>>>>>>> snapshot. The rbd command just hangs at 99% forever.
>>>>>>>>
>>>>>>>> Is this a known issue - anythink we can check?
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>>
>>
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com