Quoting Dan van der Ster (dan@xxxxxxxxxxxxxx): > > So, first question is: why didn't that OSD get detected as failing > much earlier? We have notiticed that "mon osd adjust heartbeat grace" made the cluster "realize" OSDs going down _much_ later than the MONs / OSDs themselves. Setting this parameter to "false" makes it deterministic and the cluster reacts more quickly. At least that's our experience. This might not be _the_ reason things worked out differently than expected (I guess not), but it does have an impact. Gr. Stefan -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com