> I'm a bit confused about what happened here, though: that 600 second > interval is only important if *every* OSD in the system is down. If you > reboot the data center, why didn't *any* OSD daemons start? (And even if > none did, having the ceph -s report all OSDs down instead of up isn't > going to change anything except whether your pager is going off, right?) I think you got lost in the thread of discussion. Enough OSDs for the cluster to be fully functional _did_ come back. But the cluster insisted on going to the dead ones (which it claimed all the while were up) for some I/O, even after running for 20 minutes that way, so the cluster was not functional. The 600 second "mon osd down out interval" was a red herring. It might be relevant that there was a grand total of three OSDs in the map. One came up; two did not. All objects were replicated across all three, with the hope that this sort of thing would not be fatal. It's a Jewel system with that version's default of 1 for "mon osd min down reporters". -- Bryan Henderson San Jose, California _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com