I didn't dig into it, but maybe compare to http://tracker.ceph.com/issues/16525 and see if they're the same issue? Or search for other monitor crashes with CRUSH. Looks like the backport PR is still outstanding. -Greg On Thu, Aug 18, 2016 at 4:58 AM, Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi, > > I've stumpled across a problem in jewel with respect to crush rulesets. Our > setup currently define two replicated rulesets: > # ceph osd crush rule list > [ > "replicated_ruleset", > "replicated_ssd_only", > "six_two_ec" > ] > (third ruleset is a EC ruleset) > Both rulesets are quite simple: > # ceph osd crush rule dump replicated_ssd_only > { > "rule_id": 1, > "rule_name": "replicated_ssd_only", > "ruleset": 2, > "type": 1, > "min_size": 2, > "max_size": 4, > "steps": [ > { > "op": "take", > "item": -9, > "item_name": "ssd" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > } > > # ceph osd crush rule dump replicated_ruleset > { > "rule_id": 0, > "rule_name": "replicated_ruleset", > "ruleset": 0, > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { > "op": "take", > "item": -3, > "item_name": "default" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > } > The corresponding crush tree has two roots: > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -9 5.97263 root ssd > -18 0.53998 host ceph-storage-06-ssd > 86 0.26999 osd.86 up 1.00000 1.00000 > 88 0.26999 osd.88 up 1.00000 1.00000 > -19 0.53998 host ceph-storage-05-ssd > 100 0.26999 osd.100 up 1.00000 1.00000 > 99 0.26999 osd.99 up 1.00000 1.00000 > ... > -3 531.43933 root default > -10 61.87991 host ceph-storage-02 > 35 5.45999 osd.35 up 1.00000 1.00000 > 74 5.45999 osd.74 up 1.00000 1.00000 > 111 5.45999 osd.111 up 1.00000 1.00000 > 112 5.45999 osd.112 up 1.00000 1.00000 > 113 5.45999 osd.113 up 1.00000 1.00000 > 114 5.45999 osd.114 up 1.00000 1.00000 > 115 5.45999 osd.115 up 1.00000 1.00000 > 116 5.45999 osd.116 up 1.00000 1.00000 > 117 5.45999 osd.117 up 1.00000 1.00000 > 118 3.64000 osd.118 up 1.00000 1.00000 > 119 5.45999 osd.119 up 1.00000 1.00000 > 120 3.64000 osd.120 up 1.00000 1.00000 > .... > So the first (default) ruleset should use spinning rust, the second one > should use the SSDs. Pretty standard setup for SSDs colocated with HDDs. > > After changing the crush ruleset for an existing pool ('.log' from radosgw) > to replicated_ssd_only, two of three mons crashed leaving the cluster > unaccessible. Log file content: > > .... > -13> 2016-08-18 12:22:10.800961 7fb7b5ae2700 10 log_client > _send_to_monlog to self > -12> 2016-08-18 12:22:10.800961 7fb7b5ae2700 10 log_client log_queue is 8 > last_log 8 sent 7 num 8 unsent 1 sending 1 > -11> 2016-08-18 12:22:10.800963 7fb7b5ae2700 10 log_client will send > 2016-08-18 12:22:10.800960 mon.1 192.168.6.133:6789/0 8 : audit [I > NF] from='client.3839479 :/0' entity='unknown.' cmd=[{"var": > "crush_ruleset", "prefix": "osd pool set", "pool": ".log", "val": "2"}]: > dispa > tch > -10> 2016-08-18 12:22:10.800969 7fb7b5ae2700 1 -- 192.168.6.133:6789/0 > --> 192.168.6.133:6789/0 -- log(1 entries from seq 8 at 2016-08- > 18 12:22:10.800960) v1 -- ?+0 0x7fb7cc4318c0 con 0x7fb7cb5f6e80 > -9> 2016-08-18 12:22:10.800977 7fb7b5ae2700 5 -- op tracker -- seq: 92, > time: 2016-08-18 12:22:10.800976, event: psvc:dispatch, op: mo > n_command({"var": "crush_ruleset", "prefix": "osd pool set", "pool": ".log", > "val": "2"} v 0) > -8> 2016-08-18 12:22:10.800980 7fb7b5ae2700 5 > mon.ceph-storage-05@1(leader).paxos(paxos active c 79420671..79421306) > is_readable = 1 - > now=2016-08-18 12:22:10.800980 lease_expire=2016-08-18 12:22:15.796784 has > v0 lc 79421306 > -7> 2016-08-18 12:22:10.800986 7fb7b5ae2700 5 -- op tracker -- seq: 92, > time: 2016-08-18 12:22:10.800986, event: osdmap:preprocess_que > ry, op: mon_command({"var": "crush_ruleset", "prefix": "osd pool set", > "pool": ".log", "val": "2"} v 0) > -6> 2016-08-18 12:22:10.800992 7fb7b5ae2700 5 -- op tracker -- seq: 92, > time: 2016-08-18 12:22:10.800992, event: osdmap:preprocess_com > mand, op: mon_command({"var": "crush_ruleset", "prefix": "osd pool set", > "pool": ".log", "val": "2"} v 0) > -5> 2016-08-18 12:22:10.801022 7fb7b5ae2700 5 -- op tracker -- seq: 92, > time: 2016-08-18 12:22:10.801022, event: osdmap:prepare_update > , op: mon_command({"var": "crush_ruleset", "prefix": "osd pool set", "pool": > ".log", "val": "2"} v 0) > -4> 2016-08-18 12:22:10.801029 7fb7b5ae2700 5 -- op tracker -- seq: 92, > time: 2016-08-18 12:22:10.801029, event: osdmap:prepare_comman > d, op: mon_command({"var": "crush_ruleset", "prefix": "osd pool set", > "pool": ".log", "val": "2"} v 0) > -3> 2016-08-18 12:22:10.801041 7fb7b5ae2700 5 -- op tracker -- seq: 92, > time: 2016-08-18 12:22:10.801041, event: osdmap:prepare_comman > d_impl, op: mon_command({"var": "crush_ruleset", "prefix": "osd pool set", > "pool": ".log", "val": "2"} v 0) > -2> 2016-08-18 12:22:10.802750 7fb7af185700 1 -- 192.168.6.133:6789/0 >>> :/0 pipe(0x7fb7cc373400 sd=56 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f > b7cc34aa80).accept sd=56 192.168.6.132:53238/0 > -1> 2016-08-18 12:22:10.802877 7fb7af185700 2 -- 192.168.6.133:6789/0 >>> 192.168.6.132:6800/21078 pipe(0x7fb7cc373400 sd=56 :6789 s=2 > pgs=89 cs=1 l=1 c=0x7fb7cc34aa80).reader got KEEPALIVE2 2016-08-18 > 12:22:10.802927 > 0> 2016-08-18 12:22:10.802989 7fb7b5ae2700 -1 *** Caught signal > (Segmentation fault) ** > in thread 7fb7b5ae2700 thread_name:ms_dispatch > > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) > 1: (()+0x5055ea) [0x7fb7bfc9d5ea] > 2: (()+0xf100) [0x7fb7be520100] > 3: (OSDMonitor::prepare_command_pool_set(std::map<std::string, > boost::variant<std::string, bool, long, double, std::vector<std::string, st > d::allocator<std::string> >, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::va > riant::void_, boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, b > oost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::v > ariant::void_, boost::detail::variant::void_, > boost::detail::variant::void_>, std::less<std::string>, > std::allocator<std::pair<std::string > const, boost::variant<std::string, bool, long, double, > std::vector<std::string, std::allocator<std::string> >, > boost::detail::variant::void > _, boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, boost::detai > l::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::voi > d_, boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, boost::deta > il::variant::void_> > > >&, std::basic_stringstream<char, > std::char_traits<char>, std::allocator<char> >&)+0x122f) [0x7fb7bfaa997f] > 4: (OSDMonitor::prepare_command_impl(std::shared_ptr<MonOpRequest>, > std::map<std::string, boost::variant<std::string, bool, long, double, > std::vector<std::string, std::allocator<std::string> >, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::varian > t::void_, boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_>, > std::less<std::string>, std::allocator<std::pair<std::string const, > boost::variant<std::string, bool, long, double, std::vector<std::string, > std::allocator<std::string> >, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_, > boost::detail::variant::void_, boost::detail::variant::void_> > > >>&)+0xf02c) [0x7fb7bfab968c] > 5: (OSDMonitor::prepare_command(std::shared_ptr<MonOpRequest>)+0x64f) > [0x7fb7bfabe46f] > 6: (OSDMonitor::prepare_update(std::shared_ptr<MonOpRequest>)+0x307) > [0x7fb7bfabffc7] > 7: (PaxosService::dispatch(std::shared_ptr<MonOpRequest>)+0xe0b) > [0x7fb7bfa6e60b] > 8: (Monitor::handle_command(std::shared_ptr<MonOpRequest>)+0x1d22) > [0x7fb7bfa2a4f2] > 9: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0x33b) > [0x7fb7bfa3617b] > 10: (Monitor::_ms_dispatch(Message*)+0x6c9) [0x7fb7bfa37519] > 11: (Monitor::handle_forward(std::shared_ptr<MonOpRequest>)+0x89c) > [0x7fb7bfa359ac] > 12: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0xc70) > [0x7fb7bfa36ab0] > 13: (Monitor::_ms_dispatch(Message*)+0x6c9) [0x7fb7bfa37519] > 14: (Monitor::ms_dispatch(Message*)+0x23) [0x7fb7bfa58063] > 15: (DispatchQueue::entry()+0x78a) [0x7fb7bfeb0d1a] > 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fb7bfda620d] > 17: (()+0x7dc5) [0x7fb7be518dc5] > 18: (clone()+0x6d) [0x7fb7bcde0ced] > > Complete log is available on request. I was able to recover the cluster by > fencing the third still active mon (shutdown of network interface) and > restarting the other two mons. They keep on crashing after a short time with > the same stack trace until I was able to issue the command for changing the > crush ruleset back to the 'replicated_ruleset'. After reenabling the network > interface and restarting the services, the third mon (and the OSD on that > host) rejoined the cluster. > > Regards, > Burkhard > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html