Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 02.11.21 um 15:02 schrieb Sage Weil:
On Tue, Nov 2, 2021 at 8:29 AM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

Hi Sage,

The "osd_fast_shutdown" is set to "false"
As we upgraded to luminous I also had blocked IO issuses with this
enabled.

Some weeks ago I tried out the options "osd_fast_shutdown" and
"osd_fast_shutdown_notify_mon" and also got slow ops while
stopping/starting OSDs. But I didn't ceck if this triggert the
problem with the read_leases or if this triggert my old issue
with the fast shutodnw.
Just to be clear, you should try
   osd_fast_shutdown = true
   osd_fast_shutdown_notify_mon = false

You write if the osd rejects messenger connections, because it is
stopped, the peering process will skip the read_lease timeout. If the
OSD annouces its shutdown, can we not skip this read_lease timeout as
well?

If memory serves, yes, but the notify_mon process can take more time than a
peer OSD getting ECONNREFUSED.  The combination above is the recommended
combation (and the default).


When we fast this issue we had a fresh Octopus install with default values...

If necessary I can upgrade our development cluster to Octopus again and also

run some tests.


Best,

Peter


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux