Re: corrupted rbd filesystems since jewel

Jason Dillaman <jdillama@xxxxxxxxxx> · Wed, 17 May 2017 14:20:01 -0400

Does your ceph status show pg 2.cebed0aa (still) scrubbing? Sure -- I
can quickly scan the new log if you directly send it to me.

On Wed, May 17, 2017 at 2:18 PM, Stefan Priebe - Profihost AG
<s.priebe@xxxxxxxxxxxx> wrote:
> can send the osd log - if you want?
>
> Stefan
>
> Am 17.05.2017 um 20:13 schrieb Stefan Priebe - Profihost AG:
>> Hello Jason,
>>
>> the command
>> # rados -p cephstor6 rm rbd_data.21aafa6b8b4567.0000000000000aaa
>>
>> hangs as well. Doing absolutely nothing... waiting forever.
>>
>> Greets,
>> Stefan
>>
>> Am 17.05.2017 um 17:05 schrieb Jason Dillaman:
>>> OSD 23 notes that object rbd_data.21aafa6b8b4567.0000000000000aaa is
>>> waiting for a scrub. What happens if you run "rados -p <rbd pool> rm
>>> rbd_data.21aafa6b8b4567.0000000000000aaa" (capturing the OSD 23 logs
>>> during this)? If that succeeds while your VM remains blocked on that
>>> remove op, it looks like there is some problem in the OSD where ops
>>> queued on a scrub are not properly awoken when the scrub completes.
>>>
>>> On Wed, May 17, 2017 at 10:57 AM, Stefan Priebe - Profihost AG
>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>> Hello Jason,
>>>>
>>>> after enabling the log and generating a gcore dump, the request was
>>>> successful ;-(
>>>>
>>>> So the log only contains the successfull request. So i was only able to
>>>> catch the successful request. I can send you the log on request.
>>>>
>>>> Luckily i had another VM on another Cluster behaving the same.
>>>>
>>>> This time osd.23:
>>>> # ceph --admin-daemon
>>>> /var/run/ceph/ceph-client.admin.22969.140085040783360.asok
>>>> objecter_requests
>>>> {
>>>>     "ops": [
>>>>         {
>>>>             "tid": 18777,
>>>>             "pg": "2.cebed0aa",
>>>>             "osd": 23,
>>>>             "object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa",
>>>>             "object_locator": "@2",
>>>>             "target_object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa",
>>>>             "target_object_locator": "@2",
>>>>             "paused": 0,
>>>>             "used_replica": 0,
>>>>             "precalc_pgid": 0,
>>>>             "last_sent": "1.83513e+06s",
>>>>             "attempts": 1,
>>>>             "snapid": "head",
>>>>             "snap_context": "28a43=[]",
>>>>             "mtime": "2017-05-17 16:51:06.0.455475s",
>>>>             "osd_ops": [
>>>>                 "delete"
>>>>             ]
>>>>         }
>>>>     ],
>>>>     "linger_ops": [
>>>>         {
>>>>             "linger_id": 1,
>>>>             "pg": "2.f0709c34",
>>>>             "osd": 23,
>>>>             "object_id": "rbd_header.21aafa6b8b4567",
>>>>             "object_locator": "@2",
>>>>             "target_object_id": "rbd_header.21aafa6b8b4567",
>>>>             "target_object_locator": "@2",
>>>>             "paused": 0,
>>>>             "used_replica": 0,
>>>>             "precalc_pgid": 0,
>>>>             "snapid": "head",
>>>>             "registered": "1"
>>>>         }
>>>>     ],
>>>>     "pool_ops": [],
>>>>     "pool_stat_ops": [],
>>>>     "statfs_ops": [],
>>>>     "command_ops": []
>>>> }
>>>>
>>>> OSD Logfile of OSD 23 attached.
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 17.05.2017 um 16:26 schrieb Jason Dillaman:
>>>>> On Wed, May 17, 2017 at 10:21 AM, Stefan Priebe - Profihost AG
>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>> You mean the request no matter if it is successful or not? Which log
>>>>>> level should be set to 20?
>>>>>
>>>>>
>>>>> I'm hoping you can re-create the hung remove op when OSD logging is
>>>>> increased -- "debug osd = 20" would be nice if you can turn it up that
>>>>> high while attempting to capture the blocked op.
>>>>>
>>>
>>>
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com