What was the exact sequence of events ? were you rebalancing when you did the upgrade? Did the marked out OSDs get upgraded? Did you restart all the monitors prior to changing the tunables? (Are you *sure*?) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sat, Jul 5, 2014 at 10:31 PM, James Harper <james at ejbdigital.com.au> wrote: >> >> I have 4 physical boxes each running 2 OSD's. I needed to retire one so I set >> the 2 OSD's on it to 'out' and everything went as expected. Then I noticed >> that 'ceph health' was reporting that my crush map had legacy tunables. The >> release notes told me I needed to do 'ceph osd crush tunables optimal' to fix >> this, and I wasn't running any old kernel clients, so I made it so. Shortly after >> that, my OSD's started dying until only one remained. I eventually figured out >> that they would stay up until I started the OSD's on the 'out' node. I hadn't >> made the connection to the tunables until I turned up an old mailing list post, >> but sure enough setting the tunables back to legacy got everything stable >> again. I assume that the churn introduced by 'optimal' resulted in the >> situation where the 'out' node stored the only copy of some data, because >> there were down pgs until I got all the OSD's running again >> > > Forgot to add, on the 'out' node, the following would be logged in the osd logfile: > > 7f5688e59700 -1 osd/PG.cc: In function 'void PG::fulfill_info(pg_shard_t, const pg_query_t&, std::pair<pg_shard_t, pg_info_t>&)' thread 7f5688e59700 time 2014-07-05 21:47:51.595687 > osd/PG.cc: 4424: FAILED assert(from == primary) > > and in the others when they crashed: > > 7fdcb9600700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7fdcb9600700 time 2014-07-05 21:14:57.260547 > osd/PG.cc: 5307: FAILED assert(0 == "we got a bad state machine event") > (sometimes that would appear in the 'out' node too). > > Even after the rebalance is complete and the old node is completely retired, with one node down and 2 still running (as a test), I get a very small number (0.006%) of "unfound" pg's. This is a bit of a worry... > > James > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com