Hi Dan, Mmm looks like a megaraid stuck/ hw failure, curious because today we're under heavy deleting buckets... and today fail the disk... welcome our luck Aug 17 15:44:12 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 15:44:12 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 81 e9 08 00 02 00 00 Aug 17 15:44:12 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 461498632 Aug 17 15:44:55 CEPH003 kernel: megaraid_sas 0000:03:00.0: 11557 (650994209s/0x0002/FATAL) - Unrecoverable medium error during recovery on PD 0a(e0x20/s10) at 1b087380 Aug 17 15:44:55 CEPH003 kernel: megaraid_sas 0000:03:00.0: 11558 (650994209s/0x0001/FATAL) - Uncorrectable medium error logged for VD 08/8 at 1b087380 (on PD 0a(e0x20/s10) at 1b087380) Aug 17 15:44:56 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#4 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 15:44:56 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#4 CDB: Read(10) 28 00 1b 08 72 f0 00 02 00 00 Aug 17 15:44:56 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 453538544 Aug 17 15:45:27 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 15:45:27 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 08 72 f0 00 02 00 00 Aug 17 15:45:27 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 453538544 Aug 17 16:38:43 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 16:38:43 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 81 e9 08 00 02 00 00 Aug 17 16:38:43 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 461498632 Aug 17 16:38:51 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#1 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 16:38:51 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#1 CDB: Read(10) 28 00 1b 08 72 f0 00 02 00 00 Aug 17 16:38:51 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 453538544 Aug 17 16:38:58 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 16:38:58 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 08 72 f0 00 02 00 00 Aug 17 16:38:58 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 453538544 lines 1866-1923/1923 (END) Just replaced the disk , no previusly Smart errors... Regards Manuel -----Mensaje original----- De: Dan van der Ster <dan@xxxxxxxxxxxxxx> Enviado el: lunes, 17 de agosto de 2020 17:31 Para: EDH - Manuel Rios <mriosfer@xxxxxxxxxxxxxxxx> CC: ceph-users <ceph-users@xxxxxxx> Asunto: Re: OSD RGW Index 14.2.11 crash Hi, Do you have scsi errors around the time of the crash? `journalctl -k` and look for scsi medium errors. Cheers, Dan On Mon, Aug 17, 2020 at 3:50 PM EDH - Manuel Rios <mriosfer@xxxxxxxxxxxxxxxx> wrote: > > Hi , Today one of our SSD dedicated to RGW index crashed, maybe a bug or just osd crashed. > > Our current versión 14.2.11, today we're under heavy object process... aprox 60TB data. > > ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) > nautilus (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x14a) [0x563f96b550e5] > 2: (()+0x4d72ad) [0x563f96b552ad] > 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, > unsigned long, unsigned long, ceph::buffer::v14_2_0::list*, > char*)+0xf0e) [0x563f9715aa9e] > 4: (BlueRocksRandomAccessFile::Prefetch(unsigned long, unsigned > long)+0x2a) [0x563f9718453a] > 5: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, > rocksdb::Slice>::InitDataBlock()+0x29f) [0x563f9772697f] > 6: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, > rocksdb::Slice>::FindKeyForward()+0x1c0) [0x563f97726bb0] > 7: (()+0x102fd29) [0x563f976add29] > 8: (rocksdb::MergingIterator::Next()+0x42) [0x563f97738162] > 9: (rocksdb::DBIter::Next()+0x1f3) [0x563f97641e53] > 10: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d) > [0x563f975b36bd] > 11: (RocksDBStore::RocksDBTransactionImpl::rm_range_keys(std::string > const&, std::string const&, std::string const&)+0x567) > [0x563f975beab7] > 12: (BlueStore::_do_omap_clear(BlueStore::TransContext*, std::string > const&, unsigned long)+0x72) [0x563f9708f2f2] > 13: (BlueStore::_do_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>)+0xc16) [0x563f970a6026] > 14: (BlueStore::_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>&)+0x5f) [0x563f970a6cbf] > 15: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x13f5) [0x563f970acca5] > 16: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transactio n> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x370) [0x563f970c1100] > 17: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TP Handle*)+0x7f) [0x563f96cb6d3f] > 18: (non-virtual thunk to > PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, > boost::intrusive_ptr<OpRequest>)+0x4f) [0x563f96e3015f] > 19: > (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x4a0) > [0x563f96f2a970] > 20: > (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0 > x298) [0x563f96f32d38] > 21: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) > [0x563f96e4486a] > 22: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x5b3) [0x563f96df4c63] > 23: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) > [0x563f96c34da2] > 24: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, > ThreadPool::TPHandle&)+0x62) [0x563f96ec37c2] > 25: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x90f) [0x563f96c4fd3f] > 26: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) > [0x563f97203c46] > 27: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) > [0x563f97206760] > 28: (()+0x7dd5) [0x7f1e504eddd5] > 29: (clone()+0x6d) [0x7f1e4f3ad02d] > > 0> 2020-08-17 15:45:27.609 7f1e2fa82700 -1 *** Caught signal > (Aborted) ** in thread 7f1e2fa82700 thread_name:tp_osd_tp > > ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) > nautilus (stable) > 1: (()+0xf5d0) [0x7f1e504f55d0] > 2: (gsignal()+0x37) [0x7f1e4f2e52c7] > 3: (abort()+0x148) [0x7f1e4f2e69b8] > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x199) [0x563f96b55134] > 5: (()+0x4d72ad) [0x563f96b552ad] > 6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, > unsigned long, unsigned long, ceph::buffer::v14_2_0::list*, > char*)+0xf0e) [0x563f9715aa9e] > 7: (BlueRocksRandomAccessFile::Prefetch(unsigned long, unsigned > long)+0x2a) [0x563f9718453a] > 8: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, > rocksdb::Slice>::InitDataBlock()+0x29f) [0x563f9772697f] > 9: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, > rocksdb::Slice>::FindKeyForward()+0x1c0) [0x563f97726bb0] > 10: (()+0x102fd29) [0x563f976add29] > 11: (rocksdb::MergingIterator::Next()+0x42) [0x563f97738162] > 12: (rocksdb::DBIter::Next()+0x1f3) [0x563f97641e53] > 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d) > [0x563f975b36bd] > 14: (RocksDBStore::RocksDBTransactionImpl::rm_range_keys(std::string > const&, std::string const&, std::string const&)+0x567) > [0x563f975beab7] > 15: (BlueStore::_do_omap_clear(BlueStore::TransContext*, std::string > const&, unsigned long)+0x72) [0x563f9708f2f2] > 16: (BlueStore::_do_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>)+0xc16) [0x563f970a6026] > 17: (BlueStore::_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>&)+0x5f) [0x563f970a6cbf] > 18: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x13f5) [0x563f970acca5] > 19: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transactio n> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x370) [0x563f970c1100] > 20: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TP Handle*)+0x7f) [0x563f96cb6d3f] > 21: (non-virtual thunk to > PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, > boost::intrusive_ptr<OpRequest>)+0x4f) [0x563f96e3015f] > 22: > (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x4a0) > [0x563f96f2a970] > 23: > (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0 > x298) [0x563f96f32d38] > 24: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) > [0x563f96e4486a] > 25: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x5b3) [0x563f96df4c63] > 26: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) > [0x563f96c34da2] > 27: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, > ThreadPool::TPHandle&)+0x62) [0x563f96ec37c2] > 28: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x90f) [0x563f96c4fd3f] > 29: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) > [0x563f97203c46] > 30: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) > [0x563f97206760] > 31: (()+0x7dd5) [0x7f1e504eddd5] > 32: (clone()+0x6d) [0x7f1e4f3ad02d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > Any ideas or similar situation? > > > Manuel Ríos Fernández > CEO - Business development > 677677179 · > mriosfer@xxxxxxxxxxxxxxxx<mailto:mriosfer@xxxxxxxxxxxxxxxx> > No me imprimas si no es necesario. Protejamos el medio ambiente Este > mensaje y, en su caso, los ficheros anexos son confidenciales, especialmente en lo que respecta a los datos personales, y se dirigen exclusivamente al destinatario referenciado. > Si usted no lo es y lo ha recibido por error o tiene conocimiento del mismo por cualquier motivo, le rogamos que nos lo comunique por este medio y proceda a destruirlo o borrarlo, y que en todo caso se abstenga de utilizar, reproducir, alterar, archivar o comunicar a terceros el presente mensaje y ficheros anexos, todo ello bajo pena de incurrir en responsabilidades legales. El emisor no garantiza la integridad, rapidez o seguridad del presente correo, ni se responsabiliza de posibles perjuicios derivados de la captura, incorporaciones de virus o cualesquiera otras manipulaciones efectuadas por terceros. > This e-mail message and all attachments transmitted with it may contain legally privileged, proprietary and/or confidential information intended solely for the use of the addressee. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution, duplication or other use of this message and/or its attachments is strictly prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message and its attachments. Thank you. > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx