Re: Luminous: example of a single down osd taking out a cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Dan van der Ster (dan@xxxxxxxxxxxxxx):
> 
> So, first question is: why didn't that OSD get detected as failing
> much earlier?

We have notiticed that "mon osd adjust heartbeat grace" made the cluster
"realize" OSDs going down _much_ later than the MONs / OSDs themselves.
Setting this parameter to "false" makes it deterministic and the cluster
reacts more quickly. At least that's our experience.

This might not be _the_ reason things worked out differently than
expected (I guess not), but it does have an impact.

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux