Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Can you try setting paxos_propose_interval to a smaller number, like .3 (by
default it is 2 seconds) and see if that has any effect.

It sounds like the problem is not related to getting the OSD marked down
(or at least that is not the only thing going on).  My next guess is that
the peering process that follows needs to get OSDs' up_thru values to
update and there is delay there.

Thanks!
sage


On Thu, Nov 4, 2021 at 4:15 AM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

> On Tue, 2 Nov 2021 09:02:31 -0500
> Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
>
> >
> > Just to be clear, you should try
> >   osd_fast_shutdown = true
> >   osd_fast_shutdown_notify_mon = false
>
> I added some logs to the tracker ticket with this options set.
>
>
> > You write if the osd rejects messenger connections, because it is
> > > stopped, the peering process will skip the read_lease timeout. If
> > > the OSD annouces its shutdown, can we not skip this read_lease
> > > timeout as well?
> > >
> >
> > If memory serves, yes, but the notify_mon process can take more time
> > than a peer OSD getting ECONNREFUSED.  The combination above is the
> > recommended combation (and the default).
>
> On my tests yesterday I saw again, that it took about 2 seconds between
> stopping a OSD and the first blame in the ceph.log
> With the notification enabled, I got immediately the down message.
>
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux