Slow Request on OSD

Reed Dier <reed.dier@xxxxxxxxxxx> · Wed, 31 Aug 2016 15:56:02 -0500

After a power failure left our jewel cluster crippled, I have hit a sticking point in attempted recovery.

Out of 8 osd’s, we likely lost 5-6, trying to salvage what we can.

In addition to rados pools, we were also using CephFS, and the cephfs.metadata and cephfs.data pools likely lost plenty of PG’s.

The mds has reported this ever since returning from the power loss:
> # ceph mds stat
> e3627: 1/1/1 up {0=core=up:replay}

When looking at the slow request on the osd, it shows this task which I can’t quite figure out. Any help appreciated.

> # ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok dump_ops_in_flight
> {
>     "ops": [
>         {
>             "description": "osd_op(mds.0.3625:8 6.c5265ab3 (undecoded) ack+retry+read+known_if_redirected+full_force e3668)",
>             "initiated_at": "2016-08-31 10:37:18.833644",
>             "age": 22212.235361,
>             "duration": 22212.235379,
>             "type_data": [
>                 "no flag points reached",
>                 [
>                     {
>                         "time": "2016-08-31 10:37:18.833644",
>                         "event": "initiated"
>                     }
>                 ]
>             ]
>         }
>     ],
>     "num_ops": 1
> }

Thanks,

Reed
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com