inconsistent pgs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 
> I have 4 physical boxes each running 2 OSD's. I needed to retire one so I set
> the 2 OSD's on it to 'out' and everything went as expected. Then I noticed
> that 'ceph health' was reporting that my crush map had legacy tunables. The
> release notes told me I needed to do 'ceph osd crush tunables optimal' to fix
> this, and I wasn't running any old kernel clients, so I made it so. Shortly after
> that, my OSD's started dying until only one remained. I eventually figured out
> that they would stay up until I started the OSD's on the 'out' node. I hadn't
> made the connection to the tunables until I turned up an old mailing list post,
> but sure enough setting the tunables back to legacy got everything stable
> again. I assume that the churn introduced by 'optimal' resulted in the
> situation where the 'out' node stored the only copy of some data, because
> there were down pgs until I got all the OSD's running again
> 

Forgot to add, on the 'out' node, the following would be logged in the osd logfile:

7f5688e59700 -1 osd/PG.cc: In function 'void PG::fulfill_info(pg_shard_t, const pg_query_t&, std::pair<pg_shard_t, pg_info_t>&)' thread 7f5688e59700 time 2014-07-05 21:47:51.595687
osd/PG.cc: 4424: FAILED assert(from == primary)

and in the others when they crashed:

7fdcb9600700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7fdcb9600700 time 2014-07-05 21:14:57.260547
osd/PG.cc: 5307: FAILED assert(0 == "we got a bad state machine event")
(sometimes that would appear in the 'out' node too).

Even after the rebalance is complete and the old node is completely retired,  with one node down and 2 still running (as a test), I get a very small number (0.006%) of "unfound" pg's. This is a bit of a worry...

James



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux