Re: corrupted rbd filesystems since jewel

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Mon, 15 May 2017 19:29:40 +0200

Hello Jason,
> Just so I can attempt to repeat this:

Thanks.

> (1) you had an image that was built using Hammer clients and OSDs with
> exclusive lock disabled
Yes. It was created with the hammer rbd defaults.

> (2) you updated your clients and OSDs to Jewel
> (3) you restarted your OSDs and live-migrated your VMs to pick up the
> Jewel changes

No. I updated the clients only and did a live migration for all VMs to
load up the jewel librbd.

After that i updated the mons + restart and than updated the osds + restart.

> (4) you enabled exclusive-lock, object-map, and fast-diff on a running VM
Yes.

> (5) you rebuilt the image's object map (while the VM was running?)
Yes.

> (6) things started breaking at this point
Yes but not on all VMs and only while creating and deleting snapshots.

Greets,
Stefan

> 
> On Sun, May 14, 2017 at 1:42 PM, Stefan Priebe - Profihost AG
> <s.priebe@xxxxxxxxxxxx> wrote:
>> I verified it. After a live migration of the VM i'm able to successfully
>> disable fast-diff,exclusive-lock,object-map.
>>
>> The problem only seems to occur at all if a client has connected to
>> hammer without exclusive lock. Than got upgraded to jewel and exclusive
>> lock gets enabled.
>>
>> Greets,
>> Stefan
>>
>> Am 14.05.2017 um 19:33 schrieb Stefan Priebe - Profihost AG:
>>> Hello Jason,
>>>
>>> Am 14.05.2017 um 14:04 schrieb Jason Dillaman:
>>>> It appears as though there is client.27994090 at 10.255.0.13 that
>>>> currently owns the exclusive lock on that image. I am assuming the log
>>>> is from "rbd feature disable"?
>>> Yes.
>>>
>>>> If so, I can see that it attempts to
>>>> acquire the lock and the other side is not appropriately responding to
>>>> the request.
>>>>
>>>> Assuming your system is still in this state, is there any chance to
>>>> get debug rbd=20 logs from that client by using the client's asok file
>>>> and "ceph --admin-daemon /path/to/client/asok config set debug_rbd 20"
>>>> and re-run the attempt to disable exclusive lock?
>>>
>>> It's a VM running qemu with librbd. It seems there is no default socket.
>>> If there is no way to activate it later - i don't think so. I can try to
>>> activate it in ceph.conf and migrate it to another node. But i'm not
>>> sure whether the problem persist after migration or if librbd is
>>> somewhat like reinitialized.
>>>
>>>> Also, what version of Ceph is that client running?
>>> Client and Server are on ceph 10.2.7.
>>>
>>> Greets,
>>> Stefan
>>>
>>>> Jason
>>>>
>>>> On Sun, May 14, 2017 at 1:55 AM, Stefan Priebe - Profihost AG
>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>> Hello Jason,
>>>>>
>>>>> as it still happens and VMs are crashing. I wanted to disable
>>>>> exclusive-lock,fast-diff again. But i detected that there are images
>>>>> where the rbd commands runs in an endless loop.
>>>>>
>>>>> I canceled the command after 60s and used --debug-rbd=20. Will send the
>>>>> log off list.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Greets,
>>>>> Stefan
>>>>>
>>>>> Am 13.05.2017 um 19:19 schrieb Stefan Priebe - Profihost AG:
>>>>>> Hello Jason,
>>>>>>
>>>>>> it seems to be related to fstrim and discard. I cannot reproduce it for
>>>>>> images were we don't use trim - but it's still the case it's working
>>>>>> fine for images created with jewel and it is not for images pre jewel.
>>>>>> The only difference i can find is that the images created with jewel
>>>>>> also support deep-flatten.
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>> Am 11.05.2017 um 22:28 schrieb Jason Dillaman:
>>>>>>> Assuming the only log messages you are seeing are the following:
>>>>>>>
>>>>>>> 2017-05-06 03:20:50.830626 7f7876a64700 -1
>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>>>>>>> object map in-memory
>>>>>>> 2017-05-06 03:20:50.830634 7f7876a64700 -1
>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>>>>>>> object map on-disk
>>>>>>> 2017-05-06 03:20:50.831250 7f7877265700 -1
>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0
>>>>>>>
>>>>>>> It looks like that can only occur if somehow the object-map on disk is
>>>>>>> larger than the actual image size. If that's the case, how the image
>>>>>>> got into that state is unknown to me at this point.
>>>>>>>
>>>>>>> On Thu, May 11, 2017 at 3:23 PM, Stefan Priebe - Profihost AG
>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>>>> Hi Jason,
>>>>>>>>
>>>>>>>> it seems i can at least circumvent the crashes. Since i restarted ALL
>>>>>>>> osds after enabling exclusive lock and rebuilding the object maps it had
>>>>>>>> no new crashes.
>>>>>>>>
>>>>>>>> What still makes me wonder are those
>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0
>>>>>>>>
>>>>>>>> messages.
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 08.05.2017 um 14:50 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>> Hi,
>>>>>>>>> Am 08.05.2017 um 14:40 schrieb Jason Dillaman:
>>>>>>>>>> You are saying that you had v2 RBD images created against Hammer OSDs
>>>>>>>>>> and client libraries where exclusive lock, object map, etc were never
>>>>>>>>>> enabled. You then upgraded the OSDs and clients to Jewel and at some
>>>>>>>>>> point enabled exclusive lock (and I'd assume object map) on these
>>>>>>>>>> images
>>>>>>>>>
>>>>>>>>> Yes i did:
>>>>>>>>> for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }');
>>>>>>>>> do rbd -p cephstor5 feature enable $img
>>>>>>>>> exclusive-lock,object-map,fast-diff || echo $img; done
>>>>>>>>>
>>>>>>>>>> -- or were the exclusive lock and object map features already
>>>>>>>>>> enabled under Hammer?
>>>>>>>>>
>>>>>>>>> No as they were not the rbd defaults.
>>>>>>>>>
>>>>>>>>>> The fact that you encountered an object map error on an export
>>>>>>>>>> operation is surprising to me.  Does that error re-occur if you
>>>>>>>>>> perform the export again? If you can repeat it, it would be very
>>>>>>>>>> helpful if you could run the export with "--debug-rbd=20" and capture
>>>>>>>>>> the generated logs.
>>>>>>>>>
>>>>>>>>> No i can't repeat it. It happens every night but for different images.
>>>>>>>>> But i never saw it for a vm twice. If i do he export again it works fine.
>>>>>>>>>
>>>>>>>>> I'm doing an rbd export or an rbd export-diff --from-snap it depends on
>>>>>>>>> the VM and day since the last snapshot.
>>>>>>>>>
>>>>>>>>> Greets,
>>>>>>>>> Stefan
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG
>>>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> also i'm getting these errors only for pre jewel images:
>>>>>>>>>>>
>>>>>>>>>>> 2017-05-06 03:20:50.830626 7f7876a64700 -1
>>>>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>>>>>>>>>>> object map in-memory
>>>>>>>>>>> 2017-05-06 03:20:50.830634 7f7876a64700 -1
>>>>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>>>>>>>>>>> object map on-disk
>>>>>>>>>>> 2017-05-06 03:20:50.831250 7f7877265700 -1
>>>>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0
>>>>>>>>>>>
>>>>>>>>>>> while running export-diff.
>>>>>>>>>>>
>>>>>>>>>>> Stefan
>>>>>>>>>>>
>>>>>>>>>>> Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>> Hello Json,
>>>>>>>>>>>>
>>>>>>>>>>>> while doing further testing it happens only with images created with
>>>>>>>>>>>> hammer and that got upgraded to jewel AND got enabled exclusive lock.
>>>>>>>>>>>>
>>>>>>>>>>>> Greets,
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>> Am 04.05.2017 um 14:20 schrieb Jason Dillaman:
>>>>>>>>>>>>> Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the
>>>>>>>>>>>>> command and post the resulting log to a new ticket at [1]? I'd also be
>>>>>>>>>>>>> interested if you could re-create that
>>>>>>>>>>>>> "librbd::object_map::InvalidateRequest" issue repeatably.
>>>>>>>>>>>>> n
>>>>>>>>>>>>> [1] http://tracker.ceph.com/projects/rbd/issues
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>>>>>>>>>> Example:
>>>>>>>>>>>>>> # rbd rm cephstor2/vm-136-disk-1
>>>>>>>>>>>>>> Removing image: 99% complete...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stuck at 99% and never completes. This is an image which got corrupted
>>>>>>>>>>>>>> for an unknown reason.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>>> I'm not sure whether this is related but our backup system uses rbd
>>>>>>>>>>>>>>> snapshots and reports sometimes messages like these:
>>>>>>>>>>>>>>> 2017-05-04 02:42:47.661263 7f3316ffd700 -1
>>>>>>>>>>>>>>> librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: r=0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> since we've upgraded from hammer to jewel 10.2.7 and enabled
>>>>>>>>>>>>>>>> exclusive-lock,object-map,fast-diff we've problems with corrupting VM
>>>>>>>>>>>>>>>> filesystems.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sometimes the VMs are just crashing with FS errors and a restart can
>>>>>>>>>>>>>>>> solve the problem. Sometimes the whole VM is not even bootable and we
>>>>>>>>>>>>>>>> need to import a backup.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> All of them have the same problem that you can't revert to an older
>>>>>>>>>>>>>>>> snapshot. The rbd command just hangs at 99% forever.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is this a known issue - anythink we can check?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>>
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com