Re: how to debug slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just so I understand, the duration for this operation is 329 seconds (a lot !) but all the reported events happened ~ at the same time (2019-07-20 23:13:18)
Were all the events of this ops reported ?

Why do you see a problem with the "waiting for subops from 4" event ?

Thanks, Massimo

On Wed, Jul 24, 2019 at 9:31 AM Wido den Hollander <wido@xxxxxxxx> wrote:


On 7/20/19 6:06 PM, Wei Zhao wrote:
> Hi ceph users:
> I was doing  write benchmark, and found some io will be blocked for a
> very long time. The following log is one op , it seems to wait for
> replica to finish. My ceph version is 12.2.4, and the pool is 3+2 EC .
> Does anyone give me some adives about how I sould debug next ?
>
> {
>     "ops": [
>         {
>             "description": "osd_op(client.17985.0:670679 39.18
> 39:1a63fc5c:::benchmark_data_SH-IDC1-10-5-37-174_2917453_object670678:head
> [set-alloc-hint object_size 1048576 write_size 1048576,write
> 0~1048576] snapc 0=[] ondisk+write+known_if_redirected e1135)",
>             "initiated_at": "2019-07-20 23:13:18.725466",
>             "age": 329.248875,
>             "duration": 329.248901,
>             "type_data": {
>                 "flag_point": "waiting for sub ops",
>                 "client_info": {
>                     "client": "client.17985",
>                     "client_addr": "10.5.137.174:0/1544466091",
>                     "tid": 670679
>                 },
>                 "events": [
>                     {
>                         "time": "2019-07-20 23:13:18.725466",
>                         "event": "initiated"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.726585",
>                         "event": "queued_for_pg"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.726606",
>                         "event": "reached_pg"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.726752",
>                         "event": "started"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.726842",
>                         "event": "waiting for subops from 4"
>                     },

This usually indicates there is something going on with osd.4

I would go and see if osd.4 is very busy at that moment and check if the
disk is 100% busy in iostat.

It could be a number of things, but I would check that first.

Wido

>                     {
>                         "time": "2019-07-20 23:13:18.743134",
>                         "event": "op_commit"
>                     },
>                     {
>                         "time": "2019-07-20 23:13:18.743137",
>                         "event": "op_applied"
>                     }
>                 ]
>             }
>         },
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux