hello, i'm currently running 0.61, with about 44 osd's and 4 monitors, one as a spare.
with about 6 hosts.
I've been running into an issue where when one ceph host would go down the entire system become unusable. today we recovered from a ssd crash crash for an osd's journal, and it was a lot of work to get it back up, i couldn't get monitors to come up and establish quorum. I was going to rebuild it manually, but the documentation for ceph is outdated to manually (dirty) remove a monitor using the monmap tool, i couldn't find the /mon-$id/monmap directory.
anyway, I recovered eventually and was able to run with 4 monitors, and i updated the crushmap and it crashed the monitor that i was updating the crushmap too.
it now gives me
[976]: (33) Numerical argument out of domain
when i try to manually start it, i've seen this assert failure before, just not sure whats causing it.
below i the log from the crash.
i'm not even really sure if my configs are right, i'm still pretty new at this.
below are the configs, and the last map
ceph.conf
crush.map.txt
if you need additional dumps from the monitor i can get it.
thanks
mr.npp
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com