Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

Manuel Lausch <manuel.lausch@xxxxxxxx> · Thu, 28 Oct 2021 14:21:59 +0200

Hello Istvan,

the state "waiting for readable" seems to be related to this read_lease
topic documented here:
https://docs.ceph.com/en/latest/dev/osd_internals/stale_read/

The only parameter to tune around this I know about, is the
"osd_pool_default_read_lease_ratio" which is defaulted 0.8

My clusters complains after 5 seconds for slow ops due to corresponding
configuration. On starting, stopping or restarting OSDs I see slow
ops up to about 15 seconds. In normal clusters this will not be shown
as slow op because this is under the default 32 seconds limit.

We plan to set this parameter to 0.15 to mitigate the issue before we
will upgrade to ceph octopus and later. I also where happy if someone
with deep knowledge could have a look to this issue.

To your cluster. I don't see the reasen whats triggers your cluster to
run in this stage. Did you stop/start some OSDs beforhand? 
You May tell us some more details. 

Manuel

On Thu, 28 Oct 2021 11:19:15 +0000
"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> wrote:

> Hi,
> 
> I found on the mail thread couple of emails that might be related to
> this issue, but there it came after osd restart or recovery. My
> cluster is on octopus 15.2.14 at the moment.
> 
> When slow ops started to come I can see laggy pgs and slowly the 3
> rados gateway starts to die behind the haproxy and when the slow ops
> gone it will restart, but it causes user interruption.
> 
> Digging deeper on the affected osd slow_ops historical log I can see
> couple of slow ops, where 32 seconds spent on waiting for readable
> event. Is there anything to do with this? To avoid osd restart I'm
> running with osd_op_thread_suicide_timeout=2000
> osd_op_thread_timeout=90 enabled on all osd at the moment.
> 
> Might be I'm using too small amount of pg?
> Data pool is on 4:2 ec host based and have 7 nodes, currently the pg
> number is 128 based on the balancer suggestion.
> 
> An example slow ops:
> 
> {
>             "description": "osd_op(client.141400841.0:290235021
> 28.17s0
> 28:e94c28ab:::9213182a-14ba-48ad-bde9-289a1c0c0de8.6034919.1_%2fWHITELABEL-1%2fPAGETPYE-5%2fDEVICE-4%2fLANGUAGE-38%2fSUBTYPE-0%2f148341:head
> [create,setxattr user.rgw.idtag (56) in=14b,setxattr
> user.rgw.tail_tag (56) in=17b,writefull 0~17706,setxattr
> user.rgw.manifest (375) in=17b,setxattr user.rgw.acl (123)
> in=12b,setxattr user.rgw.content_type (10) in=21b,setxattr
> user.rgw.etag (32) in=13b,setxattr
> user.rgw.x-amz-meta-storagetimestamp (40) in=36b,call
> rgw.obj_store_pg_ver in=19b,setxattr user.rgw.source_zone (4) in=20b]
> snapc 0=[] ondisk+write+known_if_redirected e37602)", "initiated_at":
> "2021-10-28T16:52:44.652426+0700", "age": 3258.4773889160001,
> "duration": 32.445993113, "type_data": { "flag_point": "started",
> "client_info": { "client": "client.141400841", "client_addr":
> "10.118.199.3:0/462844935", "tid": 290235021 }, "events":
> [ { "event": "initiated", "time": "2021-10-28T16:52:44.652426+0700",
> "duration": 0 },
>                     {
>                         "event": "throttled",
>                         "time": "2021-10-28T16:52:44.652426+0700",
>                         "duration": 6.1143999999999996e-05
>                     },
>                     {
>                         "event": "header_read",
>                         "time": "2021-10-28T16:52:44.652487+0700",
>                         "duration": 2.2340000000000001e-06
>                     },
>                     {
>                         "event": "all_read",
>                         "time": "2021-10-28T16:52:44.652489+0700",
>                         "duration": 1.1519999999999999e-06
>                     },
>                     {
>                         "event": "dispatched",
>                         "time": "2021-10-28T16:52:44.652490+0700",
>                         "duration": 3.146e-06
>                     },
>                     {
>                         "event": "queued_for_pg",
>                         "time": "2021-10-28T16:52:44.652493+0700",
>                         "duration": 4.0936000000000002e-05
>                     },
>                     {
>                         "event": "reached_pg",
>                         "time": "2021-10-28T16:52:44.652534+0700",
>                         "duration": 1.9236e-05
>                     },
>                     {
>                         "event": "waiting for readable",
>                         "time": "2021-10-28T16:52:44.652554+0700",
>                         "duration": 32.430086821000003
>                     },
>                     {
>                         "event": "reached_pg",
>                         "time": "2021-10-28T16:53:17.082640+0700",
>                         "duration": 0.00010452500000000001
>                     },
>                     {
>                         "event": "started",
>                         "time": "2021-10-28T16:53:17.082745+0700",
>                         "duration": 0.015673919000000001
>                     },
>                     {
>                         "event": "done",
>                         "time": "2021-10-28T16:53:17.098419+0700",
>                         "duration": 32.445993113
>                     }
>                 ]
>             }
>         }
> 
> Thanks in advance.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx