inconsistent pgs

greg@xxxxxxxxxxx (Gregory Farnum) · Mon, 7 Jul 2014 11:10:19 -0700



What was the exact sequence of events ? were you rebalancing when you
did the upgrade? Did the marked out OSDs get upgraded?
Did you restart all the monitors prior to changing the tunables? (Are
you *sure*?)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sat, Jul 5, 2014 at 10:31 PM, James Harper <james at ejbdigital.com.au> wrote:
>>
>> I have 4 physical boxes each running 2 OSD's. I needed to retire one so I set
>> the 2 OSD's on it to 'out' and everything went as expected. Then I noticed
>> that 'ceph health' was reporting that my crush map had legacy tunables. The
>> release notes told me I needed to do 'ceph osd crush tunables optimal' to fix
>> this, and I wasn't running any old kernel clients, so I made it so. Shortly after
>> that, my OSD's started dying until only one remained. I eventually figured out
>> that they would stay up until I started the OSD's on the 'out' node. I hadn't
>> made the connection to the tunables until I turned up an old mailing list post,
>> but sure enough setting the tunables back to legacy got everything stable
>> again. I assume that the churn introduced by 'optimal' resulted in the
>> situation where the 'out' node stored the only copy of some data, because
>> there were down pgs until I got all the OSD's running again
>>
>
> Forgot to add, on the 'out' node, the following would be logged in the osd logfile:
>
> 7f5688e59700 -1 osd/PG.cc: In function 'void PG::fulfill_info(pg_shard_t, const pg_query_t&, std::pair<pg_shard_t, pg_info_t>&)' thread 7f5688e59700 time 2014-07-05 21:47:51.595687
> osd/PG.cc: 4424: FAILED assert(from == primary)
>
> and in the others when they crashed:
>
> 7fdcb9600700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7fdcb9600700 time 2014-07-05 21:14:57.260547
> osd/PG.cc: 5307: FAILED assert(0 == "we got a bad state machine event")
> (sometimes that would appear in the 'out' node too).
>
> Even after the rebalance is complete and the old node is completely retired,  with one node down and 2 still running (as a test), I get a very small number (0.006%) of "unfound" pg's. This is a bit of a worry...
>
> James
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com