Re: OSD assert fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

This is one we've seen before, issue #326

	http://tracker.newdream.net/issues/326

Was that the first (and only?) osd to fail?

What kind of workload were you subjecting the cluster to?  Just the file 
system?  RBD?  Anything unusual?

Also, can you confirm what version of the code you were running?  The osd 
log at /var/log/ceph/osd.*.log should have a version number and sha1 id, 
something like

ceph version 0.22~rc (3cd9d853cd58c79dc12427be8488e57970abda04)

Thanks!
sage


On Mon, 6 Sep 2010, Leander Yu wrote:

> Hi all,
> I have setup a 10 osd + 2 mds + 3 mon ceph cluster. it runs ok at
> beginning. However after one day, some of the osd  crashed with
> following assert fail
> I am using the unstable trunk. ceph.conf is attached.
> 
> -------------- osd 3 -----------------
> osd/PG.h: In function 'void PG::IndexedLog::index(PG::Log::Entry&)':
> osd/PG.h:429: FAILED assert(caller_ops.count(e.reqid) == 0)
>  1: (OSD::_process_pg_info(unsigned int, int, PG::Info&, PG::Log&,
> PG::Missing&, std::map<int, MOSDPGInfo*, std::less<int>,
> std::allocator<std::pair<int const, MOSDPGInfo*> > >*, int&)+0xb06)
> [0x4cf426]
>  2: (OSD::handle_pg_log(MOSDPGLog*)+0xa9) [0x4cf999]
>  3: (OSD::_dispatch(Message*)+0x3ed) [0x4e7dfd]
>  4: (OSD::ms_dispatch(Message*)+0x39) [0x4e86c9]
>  5: (SimpleMessenger::dispatch_entry()+0x789) [0x46b5f9]
>  6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45849c]
>  7: (Thread::_entry_func(void*)+0xa) [0x46c0ca]
>  8: (()+0x6a3a) [0x7f69fd39ea3a]
>  9: (clone()+0x6d) [0x7f69fc5bc77d]
> 
> -------------- osd 7 --------------------
> osd/ReplicatedPG.cc: In function 'void ReplicatedPG::sub_op_pull(MOSDSubOp*)':
> osd/ReplicatedPG.cc:3021: FAILED assert(r == 0)
>  1: (OSD::dequeue_op(PG*)+0x344) [0x4e6fd4]
>  2: (ThreadPool::worker()+0x28f) [0x5b5a9f]
>  3: (ThreadPool::WorkThread::entry()+0xd) [0x4f0acd]
>  4: (Thread::_entry_func(void*)+0xa) [0x46c0ca]
>  5: (()+0x6a3a) [0x7efff4f12a3a]
>  6: (clone()+0x6d) [0x7efff413077d]
> 
> Please let me if you need more information. I still keep the
> environment for collecting more data for debug.
> 
> Thanks.
> 

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux