Re: corrupted rbd filesystems since jewel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OSD 23 notes that object rbd_data.21aafa6b8b4567.0000000000000aaa is
waiting for a scrub. What happens if you run "rados -p <rbd pool> rm
rbd_data.21aafa6b8b4567.0000000000000aaa" (capturing the OSD 23 logs
during this)? If that succeeds while your VM remains blocked on that
remove op, it looks like there is some problem in the OSD where ops
queued on a scrub are not properly awoken when the scrub completes.

On Wed, May 17, 2017 at 10:57 AM, Stefan Priebe - Profihost AG
<s.priebe@xxxxxxxxxxxx> wrote:
> Hello Jason,
>
> after enabling the log and generating a gcore dump, the request was
> successful ;-(
>
> So the log only contains the successfull request. So i was only able to
> catch the successful request. I can send you the log on request.
>
> Luckily i had another VM on another Cluster behaving the same.
>
> This time osd.23:
> # ceph --admin-daemon
> /var/run/ceph/ceph-client.admin.22969.140085040783360.asok
> objecter_requests
> {
>     "ops": [
>         {
>             "tid": 18777,
>             "pg": "2.cebed0aa",
>             "osd": 23,
>             "object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa",
>             "object_locator": "@2",
>             "target_object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa",
>             "target_object_locator": "@2",
>             "paused": 0,
>             "used_replica": 0,
>             "precalc_pgid": 0,
>             "last_sent": "1.83513e+06s",
>             "attempts": 1,
>             "snapid": "head",
>             "snap_context": "28a43=[]",
>             "mtime": "2017-05-17 16:51:06.0.455475s",
>             "osd_ops": [
>                 "delete"
>             ]
>         }
>     ],
>     "linger_ops": [
>         {
>             "linger_id": 1,
>             "pg": "2.f0709c34",
>             "osd": 23,
>             "object_id": "rbd_header.21aafa6b8b4567",
>             "object_locator": "@2",
>             "target_object_id": "rbd_header.21aafa6b8b4567",
>             "target_object_locator": "@2",
>             "paused": 0,
>             "used_replica": 0,
>             "precalc_pgid": 0,
>             "snapid": "head",
>             "registered": "1"
>         }
>     ],
>     "pool_ops": [],
>     "pool_stat_ops": [],
>     "statfs_ops": [],
>     "command_ops": []
> }
>
> OSD Logfile of OSD 23 attached.
>
> Greets,
> Stefan
>
> Am 17.05.2017 um 16:26 schrieb Jason Dillaman:
>> On Wed, May 17, 2017 at 10:21 AM, Stefan Priebe - Profihost AG
>> <s.priebe@xxxxxxxxxxxx> wrote:
>>> You mean the request no matter if it is successful or not? Which log
>>> level should be set to 20?
>>
>>
>> I'm hoping you can re-create the hung remove op when OSD logging is
>> increased -- "debug osd = 20" would be nice if you can turn it up that
>> high while attempting to capture the blocked op.
>>



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux