On Sat, Jun 29, 2019 at 8:12 PM Bryan Henderson <bryanh@xxxxxxxxxxxxxxxx> wrote:
> I'm not sure why the monitor did not mark it _out_ after 600 seconds
> (default)
Well, that part I understand. The monitor didn't mark the OSD out because the
monitor still considered the OSD up. No reason to mark an up OSD out.
I think the monitor should have marked the OSD down upon not hearing from it
for 15 minutes ("mon osd report interval"), then out 10 minutes after that
("mon osd down out interval").
And that's worst case. Though details of how OSDs watch each other are vague,
I suspect an existing OSD was supposed to detect the dead OSDs and report that
to the monitor, which would believe it within about a minute and mark the OSDs
down. ("osd heartbeat interval", "mon osd min down reports", "mon osd min down
reporters", "osd reporter subtree level").
--
Bryan Henderson San Jose, California
Usually, the problem is that an OSD gets too busy and misses heartbeats so other OSDs wrongly mark them down.
If 'nodown' is set, then the monitor will not mark OSDs down.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com