Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2 Nov 2021 09:02:31 -0500
Sage Weil <sage@xxxxxxxxxxxx> wrote:


> 
> Just to be clear, you should try
>   osd_fast_shutdown = true
>   osd_fast_shutdown_notify_mon = false

I added some logs to the tracker ticket with this options set.


> You write if the osd rejects messenger connections, because it is
> > stopped, the peering process will skip the read_lease timeout. If
> > the OSD annouces its shutdown, can we not skip this read_lease
> > timeout as well?
> >  
> 
> If memory serves, yes, but the notify_mon process can take more time
> than a peer OSD getting ECONNREFUSED.  The combination above is the
> recommended combation (and the default).

On my tests yesterday I saw again, that it took about 2 seconds between
stopping a OSD and the first blame in the ceph.log
With the notification enabled, I got immediately the down message. 




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux