I've seen this on luminous, but not on mimic. Can you generate a log with debug osd = 20 leading up to the crash? Thanks! sage On Tue, 8 Jan 2019, Paul Emmerich wrote: > I've seen this before a few times but unfortunately there doesn't seem > to be a good solution at the moment :( > > See also: http://tracker.ceph.com/issues/23145 > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Tue, Jan 8, 2019 at 9:37 AM David Young <funkypenguin@xxxxxxxxxxxxxx> wrote: > > > > Hi all, > > > > One of my OSD hosts recently ran into RAM contention (was swapping heavily), and after rebooting, I'm seeing this error on random OSDs in the cluster: > > > > --- > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 1: /usr/bin/ceph-osd() [0xcac700] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 2: (()+0x11390) [0x7f8fa5d0e390] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 3: (gsignal()+0x38) [0x7f8fa5241428] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 4: (abort()+0x16a) [0x7f8fa524302a] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x250) [0x7f8fa767c510] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 6: (()+0x2e5587) [0x7f8fa767c587] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x923) [0xbab5e3] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x5c3) [0xbade03] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x82) [0x79c812] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 10: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x58) [0x730ff8] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xfe) [0x759aae] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x9c5720] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x590) [0x769760] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x476) [0x7f8fa76824f6] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f8fa76836b0] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 16: (()+0x76ba) [0x7f8fa5d046ba] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 17: (clone()+0x6d) [0x7f8fa531341d] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > Jan 08 03:34:36 prod1 systemd[1]: ceph-osd@43.service: Main process exited, code=killed, status=6/ABRT > > --- > > > > I've restarted all the OSDs and the mons, but still encountering the above. > > > > Any ideas / suggestions? > > > > Thanks! > > D > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com