Re: corrupted rbd filesystems since jewel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Jason,

it seems to be related to fstrim and discard. I cannot reproduce it for
images were we don't use trim - but it's still the case it's working
fine for images created with jewel and it is not for images pre jewel.
The only difference i can find is that the images created with jewel
also support deep-flatten.

Greets,
Stefan

Am 11.05.2017 um 22:28 schrieb Jason Dillaman:
> Assuming the only log messages you are seeing are the following:
> 
> 2017-05-06 03:20:50.830626 7f7876a64700 -1
> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
> object map in-memory
> 2017-05-06 03:20:50.830634 7f7876a64700 -1
> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
> object map on-disk
> 2017-05-06 03:20:50.831250 7f7877265700 -1
> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0
> 
> It looks like that can only occur if somehow the object-map on disk is
> larger than the actual image size. If that's the case, how the image
> got into that state is unknown to me at this point.
> 
> On Thu, May 11, 2017 at 3:23 PM, Stefan Priebe - Profihost AG
> <s.priebe@xxxxxxxxxxxx> wrote:
>> Hi Jason,
>>
>> it seems i can at least circumvent the crashes. Since i restarted ALL
>> osds after enabling exclusive lock and rebuilding the object maps it had
>> no new crashes.
>>
>> What still makes me wonder are those
>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0
>>
>> messages.
>>
>> Greets,
>> Stefan
>>
>> Am 08.05.2017 um 14:50 schrieb Stefan Priebe - Profihost AG:
>>> Hi,
>>> Am 08.05.2017 um 14:40 schrieb Jason Dillaman:
>>>> You are saying that you had v2 RBD images created against Hammer OSDs
>>>> and client libraries where exclusive lock, object map, etc were never
>>>> enabled. You then upgraded the OSDs and clients to Jewel and at some
>>>> point enabled exclusive lock (and I'd assume object map) on these
>>>> images
>>>
>>> Yes i did:
>>> for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }');
>>> do rbd -p cephstor5 feature enable $img
>>> exclusive-lock,object-map,fast-diff || echo $img; done
>>>
>>>> -- or were the exclusive lock and object map features already
>>>> enabled under Hammer?
>>>
>>> No as they were not the rbd defaults.
>>>
>>>> The fact that you encountered an object map error on an export
>>>> operation is surprising to me.  Does that error re-occur if you
>>>> perform the export again? If you can repeat it, it would be very
>>>> helpful if you could run the export with "--debug-rbd=20" and capture
>>>> the generated logs.
>>>
>>> No i can't repeat it. It happens every night but for different images.
>>> But i never saw it for a vm twice. If i do he export again it works fine.
>>>
>>> I'm doing an rbd export or an rbd export-diff --from-snap it depends on
>>> the VM and day since the last snapshot.
>>>
>>> Greets,
>>> Stefan
>>>
>>>>
>>>> On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG
>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> also i'm getting these errors only for pre jewel images:
>>>>>
>>>>> 2017-05-06 03:20:50.830626 7f7876a64700 -1
>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>>>>> object map in-memory
>>>>> 2017-05-06 03:20:50.830634 7f7876a64700 -1
>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>>>>> object map on-disk
>>>>> 2017-05-06 03:20:50.831250 7f7877265700 -1
>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0
>>>>>
>>>>> while running export-diff.
>>>>>
>>>>> Stefan
>>>>>
>>>>> Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG:
>>>>>> Hello Json,
>>>>>>
>>>>>> while doing further testing it happens only with images created with
>>>>>> hammer and that got upgraded to jewel AND got enabled exclusive lock.
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>> Am 04.05.2017 um 14:20 schrieb Jason Dillaman:
>>>>>>> Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the
>>>>>>> command and post the resulting log to a new ticket at [1]? I'd also be
>>>>>>> interested if you could re-create that
>>>>>>> "librbd::object_map::InvalidateRequest" issue repeatably.
>>>>>>> n
>>>>>>> [1] http://tracker.ceph.com/projects/rbd/issues
>>>>>>>
>>>>>>> On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG
>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>>>> Example:
>>>>>>>> # rbd rm cephstor2/vm-136-disk-1
>>>>>>>> Removing image: 99% complete...
>>>>>>>>
>>>>>>>> Stuck at 99% and never completes. This is an image which got corrupted
>>>>>>>> for an unknown reason.
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>> I'm not sure whether this is related but our backup system uses rbd
>>>>>>>>> snapshots and reports sometimes messages like these:
>>>>>>>>> 2017-05-04 02:42:47.661263 7f3316ffd700 -1
>>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: r=0
>>>>>>>>>
>>>>>>>>> Stefan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> since we've upgraded from hammer to jewel 10.2.7 and enabled
>>>>>>>>>> exclusive-lock,object-map,fast-diff we've problems with corrupting VM
>>>>>>>>>> filesystems.
>>>>>>>>>>
>>>>>>>>>> Sometimes the VMs are just crashing with FS errors and a restart can
>>>>>>>>>> solve the problem. Sometimes the whole VM is not even bootable and we
>>>>>>>>>> need to import a backup.
>>>>>>>>>>
>>>>>>>>>> All of them have the same problem that you can't revert to an older
>>>>>>>>>> snapshot. The rbd command just hangs at 99% forever.
>>>>>>>>>>
>>>>>>>>>> Is this a known issue - anythink we can check?
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>>
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux