ceph monitor crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hello, i'm currently running 0.61, with about 44 osd's and 4 monitors, one as a spare.

with about 6 hosts.

I've been running into an issue where when one ceph host would go down the entire system become unusable. today we recovered from a ssd crash crash for an osd's journal, and it was a lot of work to get it back up, i couldn't get monitors to come up and establish quorum. I was going to rebuild it manually, but the documentation for ceph is outdated to manually (dirty) remove a monitor using the monmap tool, i couldn't find the /mon-$id/monmap directory.

anyway, I recovered eventually and was able to run with 4 monitors, and i updated the crushmap and it crashed the monitor that i was updating the crushmap too.

it now gives me

[976]: (33) Numerical argument out of domain

when i try to manually start it, i've seen this assert failure before, just not sure whats causing it.

below i the log from the crash.
https://docs.google.com/a/nopatentpending.com/file/d/0BwQnRodV8ActNTVFUVpLVjdMSGc/edit

i'm not even really sure if my configs are right, i'm still pretty new at this.

below are the configs, and the last map

ceph.conf
https://docs.google.com/file/d/0BwQnRodV8Acta3ZfSnBrOU40MW8/edit?usp=sharing

crush.map.txt
https://docs.google.com/file/d/0BwQnRodV8Actbl9hY054Mm9UTXM/edit?usp=sharing

if you need additional dumps from the monitor i can get it.

thanks
mr.npp
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux