Hi Manuel, I'm looking at the ticket for this issue ( https://tracker.ceph.com/issues/51463) and tried to reproduce. This was initially trivial to do with vstart (rados bench paused for many seconds afters stopping an osd) but it turns out that was because the vstart ceph.conf includes `osd_fast_shutdown = false`. Once I enabled that again (as it is by default on a normal cluster) I did not see any noticeable interruption when an OSD was stopped. Can you confirm what osd_fast_shutdown and osd_fast_shutdown_notify_mon are set to on your cluster? The intent is that when an OSD goes down, it will no longer accept messenger connection attempts, and peer OSDs will inform the monitor with a flag indicating the OSD is definitely dead (vs slow or unresponsive). This will allow the peering process to skip waiting for the read lease to time out. If you're seeing the laggy or 'waiting for readable' state, then that isn't happening.. probably because the OSD shutdown isn't working as originally intended. If it's not one of those two options, make you can include a 'ceph config dump' (or jsut a list of the changed values at least) so we can see what else might be affecting OSD shutdown... Thanks! sage On Thu, Oct 28, 2021 at 7:23 AM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote: > Hello Istvan, > > the state "waiting for readable" seems to be related to this read_lease > topic documented here: > https://docs.ceph.com/en/latest/dev/osd_internals/stale_read/ > > The only parameter to tune around this I know about, is the > "osd_pool_default_read_lease_ratio" which is defaulted 0.8 > > My clusters complains after 5 seconds for slow ops due to corresponding > configuration. On starting, stopping or restarting OSDs I see slow > ops up to about 15 seconds. In normal clusters this will not be shown > as slow op because this is under the default 32 seconds limit. > > We plan to set this parameter to 0.15 to mitigate the issue before we > will upgrade to ceph octopus and later. I also where happy if someone > with deep knowledge could have a look to this issue. > > > To your cluster. I don't see the reasen whats triggers your cluster to > run in this stage. Did you stop/start some OSDs beforhand? > You May tell us some more details. > > > Manuel > > > > > On Thu, 28 Oct 2021 11:19:15 +0000 > "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> wrote: > > > Hi, > > > > I found on the mail thread couple of emails that might be related to > > this issue, but there it came after osd restart or recovery. My > > cluster is on octopus 15.2.14 at the moment. > > > > When slow ops started to come I can see laggy pgs and slowly the 3 > > rados gateway starts to die behind the haproxy and when the slow ops > > gone it will restart, but it causes user interruption. > > > > Digging deeper on the affected osd slow_ops historical log I can see > > couple of slow ops, where 32 seconds spent on waiting for readable > > event. Is there anything to do with this? To avoid osd restart I'm > > running with osd_op_thread_suicide_timeout=2000 > > osd_op_thread_timeout=90 enabled on all osd at the moment. > > > > Might be I'm using too small amount of pg? > > Data pool is on 4:2 ec host based and have 7 nodes, currently the pg > > number is 128 based on the balancer suggestion. > > > > An example slow ops: > > > > { > > "description": "osd_op(client.141400841.0:290235021 > > 28.17s0 > > > 28:e94c28ab:::9213182a-14ba-48ad-bde9-289a1c0c0de8.6034919.1_%2fWHITELABEL-1%2fPAGETPYE-5%2fDEVICE-4%2fLANGUAGE-38%2fSUBTYPE-0%2f148341:head > > [create,setxattr user.rgw.idtag (56) in=14b,setxattr > > user.rgw.tail_tag (56) in=17b,writefull 0~17706,setxattr > > user.rgw.manifest (375) in=17b,setxattr user.rgw.acl (123) > > in=12b,setxattr user.rgw.content_type (10) in=21b,setxattr > > user.rgw.etag (32) in=13b,setxattr > > user.rgw.x-amz-meta-storagetimestamp (40) in=36b,call > > rgw.obj_store_pg_ver in=19b,setxattr user.rgw.source_zone (4) in=20b] > > snapc 0=[] ondisk+write+known_if_redirected e37602)", "initiated_at": > > "2021-10-28T16:52:44.652426+0700", "age": 3258.4773889160001, > > "duration": 32.445993113, "type_data": { "flag_point": "started", > > "client_info": { "client": "client.141400841", "client_addr": > > "10.118.199.3:0/462844935", "tid": 290235021 }, "events": > > [ { "event": "initiated", "time": "2021-10-28T16:52:44.652426+0700", > > "duration": 0 }, > > { > > "event": "throttled", > > "time": "2021-10-28T16:52:44.652426+0700", > > "duration": 6.1143999999999996e-05 > > }, > > { > > "event": "header_read", > > "time": "2021-10-28T16:52:44.652487+0700", > > "duration": 2.2340000000000001e-06 > > }, > > { > > "event": "all_read", > > "time": "2021-10-28T16:52:44.652489+0700", > > "duration": 1.1519999999999999e-06 > > }, > > { > > "event": "dispatched", > > "time": "2021-10-28T16:52:44.652490+0700", > > "duration": 3.146e-06 > > }, > > { > > "event": "queued_for_pg", > > "time": "2021-10-28T16:52:44.652493+0700", > > "duration": 4.0936000000000002e-05 > > }, > > { > > "event": "reached_pg", > > "time": "2021-10-28T16:52:44.652534+0700", > > "duration": 1.9236e-05 > > }, > > { > > "event": "waiting for readable", > > "time": "2021-10-28T16:52:44.652554+0700", > > "duration": 32.430086821000003 > > }, > > { > > "event": "reached_pg", > > "time": "2021-10-28T16:53:17.082640+0700", > > "duration": 0.00010452500000000001 > > }, > > { > > "event": "started", > > "time": "2021-10-28T16:53:17.082745+0700", > > "duration": 0.015673919000000001 > > }, > > { > > "event": "done", > > "time": "2021-10-28T16:53:17.098419+0700", > > "duration": 32.445993113 > > } > > ] > > } > > } > > > > Thanks in advance. > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx