Re: How does monitor know OSD is dead?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The thing i've seen a lot is where an OSD would get marked down because of a failed drive, then then it would add itself right back again


On Fri, Jun 28, 2019 at 9:12 AM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
I'm not sure why the monitor did not mark it down after 600 seconds (default). The reason it is so long is that you don't want to move data around unnecessarily if the osd is just being rebooted/restarted. Usually, you will still have min_size OSDs available for all PGs that will allow IO to continue. Then when the down timeout expires it will start backfilling and recovering the PGs that were affected. Double check that size != min_size for your pools.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Jun 27, 2019 at 5:26 PM Bryan Henderson <bryanh@xxxxxxxxxxxxxxxx> wrote:
What does it take for a monitor to consider an OSD down which has been dead as
a doornail since the cluster started?

A couple of times, I have seen 'ceph status' report an OSD was up, when it was
quite dead.  Recently, a couple of OSDs were on machines that failed to boot
up after a power failure.  The rest of the Ceph cluster came up, though, and
reported all OSDs up and in.  I/Os stalled, probably because they were waiting
for the dead OSDs to come back.

I waited 15 minutes, because the manual says if the monitor doesn't hear a
heartbeat from an OSD in that long (default value of mon_osd_report_timeout),
it marks it down.  But it didn't.  I did "osd down" commands for the dead OSDs
and the status changed to down and I/O started working.

And wouldn't even 15 minutes of grace be unacceptable if it means I/Os have to
wait that long before falling back to a redundant OSD?

--
Bryan Henderson                                   San Jose, California
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux