Re: How does monitor know OSD is dead?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> I'm a bit confused about what happened here, though: that 600 second 
> interval is only important if *every* OSD in the system is down.  If you 
> reboot the data center, why didn't *any* OSD daemons start?  (And even if 
> none did, having the ceph -s report all OSDs down instead of up isn't 
> going to change anything except whether your pager is going off, right?)

I think you got lost in the thread of discussion.  Enough OSDs for the cluster
to be fully functional _did_ come back.  But the cluster insisted on going to
the dead ones (which it claimed all the while were up) for some I/O, even
after running for 20 minutes that way, so the cluster was not functional.  The
600 second "mon osd down out interval" was a red herring.

It might be relevant that there was a grand total of three OSDs in the map.
One came up; two did not.  All objects were replicated across all three, with
the hope that this sort of thing would not be fatal.  It's a Jewel system with
that version's default of 1 for "mon osd min down reporters".

-- 
Bryan Henderson                                   San Jose, California
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux