Re: how to debug slow requests

Wido den Hollander <wido@xxxxxxxx> · Wed, 24 Jul 2019 09:24:13 +0200

On 7/20/19 6:06 PM, Wei Zhao wrote:
> Hi ceph users:
> I was doing  write benchmark, and found some io will be blocked for a
> very long time. The following log is one op , it seems to wait for
> replica to finish. My ceph version is 12.2.4, and the pool is 3+2 EC .
> Does anyone give me some adives about how I sould debug next ?
> 
> {
>     "ops": [
>         {
>             "description": "osd_op(client.17985.0:670679 39.18
> 39:1a63fc5c:::benchmark_data_SH-IDC1-10-5-37-174_2917453_object670678:head
> [set-alloc-hint object_size 1048576 write_size 1048576,write
> 0~1048576] snapc 0=[] ondisk+write+known_if_redirected e1135)",
>             "initiated_at": "2019-07-20 23:13:18.725466",
>             "age": 329.248875,
>             "duration": 329.248901,
>             "type_data": {
>                 "flag_point": "waiting for sub ops",
>                 "client_info": {
>                     "client": "client.17985",
>                     "client_addr": "10.5.137.174:0/1544466091",
>                     "tid": 670679
>                 },
>                 "events": [
>                     {
>                         "time": "2019-07-20 23:13:18.725466",
>                         "event": "initiated"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.726585",
>                         "event": "queued_for_pg"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.726606",
>                         "event": "reached_pg"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.726752",
>                         "event": "started"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.726842",
>                         "event": "waiting for subops from 4"
>                     },

This usually indicates there is something going on with osd.4

I would go and see if osd.4 is very busy at that moment and check if the
disk is 100% busy in iostat.

It could be a number of things, but I would check that first.

Wido

>                     {
>                         "time": "2019-07-20 23:13:18.743134",
>                         "event": "op_commit"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.743137",
>                         "event": "op_applied"
>                     }
>                 ]
>             }
>         },
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com