Re: How does monitor know OSD is dead?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 1 Jul 2019 13:43:16 -0700

On Sat, Jun 29, 2019 at 8:13 PM Bryan Henderson <bryanh@xxxxxxxxxxxxxxxx> wrote:
>
> > I'm not sure why the monitor did not mark it _out_ after 600 seconds
> > (default)
>
> Well, that part I understand.  The monitor didn't mark the OSD out because the
> monitor still considered the OSD up.  No reason to mark an up OSD out.
>
> I think the monitor should have marked the OSD down upon not hearing from it
> for 15 minutes ("mon osd report interval"), then out 10 minutes after that
> ("mon osd down out interval").

It sounds like you had the whole cluster off and turned it on, and
those servers didn't come up. This is why.

The methods of detecting an OSD as down are
1) OSD heartbeat peers. That's as Robert describes (by default).
2) When an OSD is connected to a monitor, they heartbeat each other at
very long intervals and the monitor flags the OSD down if it
disappears and isn't connected to a different monitor.

In your case, the OSD wasn't connected to any monitor, and it hadn't
set up any heartbeat peers.

Normally in the case of a restart then somebody who used to have a
connection to the OSD would still be running and flag it as dead. But
if *all* the daemons in the cluster lose their soft state, that can't
happen.
-Greg

>
> And that's worst case.  Though details of how OSDs watch each other are vague,
> I suspect an existing OSD was supposed to detect the dead OSDs and report that
> to the monitor, which would believe it within about a minute and mark the OSDs
> down.  ("osd heartbeat interval", "mon osd min down reports", "mon osd min down
> reporters", "osd reporter subtree level").
>
> --
> Bryan Henderson                                   San Jose, California
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com