Hi, I am getting some troubles with the ceph mon stability. Every couple of days mons die. I only see this error in the logs: 2014-07-08 14:24:53.056805 7f713bb5b700 -1 mon.cephmon02 at 1(peon) e2 *** Got Signal Interrupt *** 2014-07-08 14:24:53.061795 7f713bb5b700 1 mon.cephmon02 at 1(peon) e2 shutdown 2014-07-08 14:24:53.072424 7f713bb5b700 0 quorum service shutdown 2014-07-08 14:24:53.072439 7f713bb5b700 0 mon.cephmon02 at 1(shutdown).health(36) HealthMonitor::service_shutdown 1 services 2014-07-08 14:24:53.072446 7f713bb5b700 0 quorum service shutdown ire=2014-07-08 17:32:21.518667 has v0 lc 2837 2014-07-08 17:32:18.642858 7fa7c1a9a700 1 mon.cephmon03 at 2(peon).paxos(paxos active c 2260..2837) is_readable now=2014-07-08 17:32:18.642861 lease_expire=2014-07-08 17:32:21.518667 has v0 lc 2837 2014-07-08 17:32:18.637279 7fa7c0496700 -1 mon.cephmon03 at 2(peon) e2 *** Got Signal Interrupt *** 2014-07-08 17:32:18.642936 7fa7c0496700 1 mon.cephmon03 at 2(peon) e2 shutdown 2014-07-08 17:32:18.643106 7fa7c1a9a700 1 mon.cephmon03 at 2(peon).paxos(paxos active c 2260..2837) is_readable now=2014-07-08 17:32:18.643109 lease_expire=2014-07-08 17:32:21.518667 has v0 lc 2837 2014-07-08 17:32:18.659001 7fa7c0496700 0 quorum service shutdown 2014-07-08 17:32:18.659016 7fa7c0496700 0 mon.cephmon03 at 2(shutdown).health(38) HealthMonitor::service_shutdown 1 services 2014-07-08 17:32:18.659023 7fa7c0496700 0 quorum service shutdown 2014-07-08 17:32:18.685100 7fa7bdee8700 0 -- 10.10.3.3:6789/0 >> 10.10.33.31:0/1413204834 pipe(0x56b9180 sd=10 :6789 s=0 pgs=0 cs=0 l=0 c=0x2d06040).accept peer addr is really 10.10.33.31:0/1413204834 (socket is 10.10.33.31:56502/0) Any idea? Bertrand Russell: *"El problema con el mundo es que los est?pidos est?n seguros de todo y los inteligentes est?n llenos de dudas*" -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140710/f0495d91/attachment.htm>