On Thu, Dec 12, 2013 at 10:58 PM, Jeppesen, Nelson <Nelson.Jeppesen@xxxxxxxxxx> wrote: > I have an issue with incomplete pgs, I’ve tried repairing it but no such > luck. Any ideas what to check? Have you looked at http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg? In particular, what's the output of "ceph pg query" for 22.4a and 22.ee? -Greg > Output from ‘ceph health detail’ > > HEALTH_ERR 2 pgs inconsistent; 1 pgs recovering; 1 pgs stuck unclean; > recovery 15/863113 degraded (0.002%); 5/287707 unfound (0.002%); 4 scrub > errors > > pg 22.ee is stuck unclean for 131473.768406, current state > active+recovering+inconsistent, last acting [45,16,21] > > pg 22.ee is active+recovering+inconsistent, acting [45,16,21], 5 unfound > > pg 22.4a is active+clean+inconsistent, acting [2,25,34] > > recovery 15/863113 degraded (0.002%); 5/287707 unfound (0.002%) > > 4 scrub errors > > > > I tried to remove one of the nodes and now the service crashes on startup > > > > Dec 12 22:56:32 ceph12 ceph-osd: 0> 2013-12-12 22:56:32.000946 > 7fe4dcd4a700 -1 *** Caught signal (Aborted) **#012 in thread > 7fe4dcd4a700#012#012 ceph version 0.67.4 > (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)#012 1: /usr/bin/ceph-osd() > [0x8001ea]#012 2: (()+0xfcb0) [0x7fe4f029bcb0]#012 3: (gsignal()+0x35) > [0x7fe4eea53425]#012 4: (abort()+0x17b) [0x7fe4eea56b8b]#012 5: > (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe4ef3a669d]#012 6: > (()+0xb5846) [0x7fe4ef3a4846]#012 7: (()+0xb5873) [0x7fe4ef3a4873]#012 8: > (()+0xb596e) [0x7fe4ef3a496e]#012 9: > (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) > [0x8c7087]#012 10: > (object_info_t::decode(ceph::buffer::list::iterator&)+0x73) [0x95c163]#012 > 11: (ReplicatedPG::build_push_op(ObjectRecoveryInfo const&, > ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*)+0x87f) > [0x5f123f]#012 12: (ReplicatedPG::handle_pull(int, PullOp&, PushOp*)+0xc1) > [0x5f4611]#012 13: > (ReplicatedPG::do_pull(std::tr1::shared_ptr<OpRequest>)+0x4f4) > [0x5f53b4]#012 14: (PG::do_request(std::tr1::shared_ptr<OpRequest>, > ThreadPool::TPHandle&)+0x348) [0x703e38]#012 15: > (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, > ThreadPool::TPHandle&)+0x330) [0x658620]#012 16: > (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x4a0) > [0x66ed10]#012 17: > (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, > std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >>::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6aa25c]#012 18: > (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b8f96]#012 19: > (ThreadPool::WorkThread::entry()+0x10) [0x8bada0]#012 20: (()+0x7e9a) > [0x7fe4f0293e9a]#012 21: (clone()+0x6d) [0x7fe4eeb113fd]#012 NOTE: a copy of > the executable, or `objdump -rdS <executable>` is needed to interpret this. > > > > Nelson Jeppesen____________ > > Disney Technology Solutions and Services > > Phone 206-588-5001 > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com