On Sun, 8 Oct 2017, Ugis wrote: > ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable) > I'm in process of migrating OSDs to bluestore. Several migrations > successful but last one failed. OSD has gone down after some time > backfilling, found some errors in log, tried to start it again, but > same errors appeared and OSD went down. After completing this > bugreport I started OSD again out of curiosity and now it works ok. > During this time I have started another bluestore OSD for backfilling. > Nevertheless errors logged below. > > This seems rather like a bug in OSD not HDD problem. I do not see > SMART problems or any issues related to HDD in dmesg for this OSD. > > The log: > -3> 2017-10-08 16:56:31.861941 7f7481b4f700 -1 > bluestore(/var/lib/ceph/osd/ceph-3) _txc_add_transaction error (7) > Argument list too long not handled on operation 12 (op 2, counting > from 0) > -2> 2017-10-08 16:56:31.861960 7f7481b4f700 -1 > bluestore(/var/lib/ceph/osd/ceph-3) unexpected error code > -1> 2017-10-08 16:56:31.861962 7f7481b4f700 0 > bluestore(/var/lib/ceph/osd/ceph-3) transaction dump: > { > "ops": [ > { > "op_num": 0, > "op_name": "remove", > "collection": "53.67_head", > "oid": > "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#" > }, > { > "op_num": 1, > "op_name": "touch", > "collection": "53.67_head", > "oid": > "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#" > }, > { > "op_num": 2, > "op_name": "truncate", > "collection": "53.67_head", > "oid": > "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#", > "offset": 15560314374 That's a 15GB object! What is this pool use for? Is there really an object that big, or is there a bug? Bluestore has a hard limit of ~4gb for objects. That is about 3 orders of magnitude higher than what we normally store so it not expected to be a problem... sage > }, > { > "op_num": 3, > "op_name": "op_setallochint", > "collection": "53.67_head", > "oid": > "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#", > "expected_object_size": "0", > "expected_write_size": "0" > }, > { > "op_num": 4, > "op_name": "write", > "collection": "53.67_head", > "oid": > "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#", > "length": 8388608, > "offset": 0, > "bufferlist length": 8388608 > }, > { > "op_num": 5, > "op_name": "setattrs", > "collection": "53.67_head", > "oid": > "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#", > "attr_lens": { > "_": 286, > "snapset": 31 > } > } > ] > } > > 0> 2017-10-08 16:56:31.871938 7f7481b4f700 -1 > /build/ceph-12.2.1/src/os/bluestore/BlueStore.cc: In function 'void > BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)' thread 7f7481b4f700 time 2017-10-08 > 16:56:31.862077 > /build/ceph-12.2.1/src/os/bluestore/BlueStore.cc: 9299: FAILED > assert(0 == "unexpected error") > > ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x55fc757a53f2] > 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x1644) [0x55fc7565f404] > 3: (BlueStore::queue_transactions(ObjectStore::Sequencer*, > std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> >&, > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x52e) > [0x55fc75660a7e] > 4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, > ObjectStore::Transaction&&, Context*, Context*, Context*, > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x17c) > [0x55fc7520dd8c] > 5: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, > boost::intrusive_ptr<OpRequest>)+0x58) [0x55fc75399598] > 6: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x391) > [0x55fc754a8a41] > 7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1) > [0x55fc754b7a21] > 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > [0x55fc753c7aa0] > 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x55d) [0x55fc7532c62d] > 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) > [0x55fc751af1a9] > 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > const&)+0x57) [0x55fc75448ae7] > 12: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x130e) [0x55fc751d67de] > 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) > [0x55fc757aa1e4] > 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fc757ad220] > 15: (()+0x76ba) [0x7f749b6d36ba] > 16: (clone()+0x6d) [0x7f749a74a82d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 rbd_mirror > 0/ 5 rbd_replay > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 1/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/10 civetweb > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > 0/ 0 refs > 1/ 5 xio > 1/ 5 compressor > 1/ 5 bluestore > 1/ 5 bluefs > 1/ 3 bdev > 1/ 5 kstore > 4/ 5 rocksdb > 4/ 5 leveldb > 4/ 5 memdb > 1/ 5 kinetic > 1/ 5 fuse > 1/ 5 mgr > 1/ 5 mgrc > 1/ 5 dpdk > 1/ 5 eventtrace > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.3.log > --- end dump of recent events --- > 2017-10-08 16:56:31.909899 7f7481b4f700 -1 *** Caught signal (Aborted) ** > in thread 7f7481b4f700 thread_name:tp_osd_tp > > ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable) > 1: (()+0xa5a634) [0x55fc75762634] > 2: (()+0x11390) [0x7f749b6dd390] > 3: (gsignal()+0x38) [0x7f749a679428] > 4: (abort()+0x16a) [0x7f749a67b02a] > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x28e) [0x55fc757a557e] > 6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x1644) [0x55fc7565f404] > 7: (BlueStore::queue_transactions(ObjectStore::Sequencer*, > std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> >&, > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x52e) > [0x55fc75660a7e] > 8: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, > ObjectStore::Transaction&&, Context*, Context*, Context*, > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x17c) > [0x55fc7520dd8c] > 9: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, > boost::intrusive_ptr<OpRequest>)+0x58) [0x55fc75399598] > 10: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x391) > [0x55fc754a8a41] > 11: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1) > [0x55fc754b7a21] > 12: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > [0x55fc753c7aa0] > 13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x55d) [0x55fc7532c62d] > 14: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) > [0x55fc751af1a9] > 15: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > const&)+0x57) [0x55fc75448ae7] > 16: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x130e) [0x55fc751d67de] > 17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) > [0x55fc757aa1e4] > 18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fc757ad220] > 19: (()+0x76ba) [0x7f749b6d36ba] > 20: (clone()+0x6d) [0x7f749a74a82d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- begin dump of recent events --- > 0> 2017-10-08 16:56:31.909899 7f7481b4f700 -1 *** Caught signal > (Aborted) ** > in thread 7f7481b4f700 thread_name:tp_osd_tp > > ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable) > 1: (()+0xa5a634) [0x55fc75762634] > 2: (()+0x11390) [0x7f749b6dd390] > 3: (gsignal()+0x38) [0x7f749a679428] > 4: (abort()+0x16a) [0x7f749a67b02a] > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x28e) [0x55fc757a557e] > 6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x1644) [0x55fc7565f404] > 7: (BlueStore::queue_transactions(ObjectStore::Sequencer*, > std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> >&, > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x52e) > [0x55fc75660a7e] > 8: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, > ObjectStore::Transaction&&, Context*, Context*, Context*, > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x17c) > [0x55fc7520dd8c] > 9: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, > boost::intrusive_ptr<OpRequest>)+0x58) [0x55fc75399598] > 10: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x391) > [0x55fc754a8a41] > 11: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1) > [0x55fc754b7a21] > 12: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > [0x55fc753c7aa0] > 13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x55d) [0x55fc7532c62d] > 14: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) > [0x55fc751af1a9] > 15: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > const&)+0x57) [0x55fc75448ae7] > 16: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x130e) [0x55fc751d67de] > 17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) > [0x55fc757aa1e4] > 18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fc757ad220] > 19: (()+0x76ba) [0x7f749b6d36ba] > 20: (clone()+0x6d) [0x7f749a74a82d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 rbd_mirror > 0/ 5 rbd_replay > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 1/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/10 civetweb > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > 0/ 0 refs > 1/ 5 xio > 1/ 5 compressor > 1/ 5 bluestore > 1/ 5 bluefs > 1/ 3 bdev > 1/ 5 kstore > 4/ 5 rocksdb > 4/ 5 leveldb > 4/ 5 memdb > 1/ 5 kinetic > 1/ 5 fuse > 1/ 5 mgr > 1/ 5 mgrc > 1/ 5 dpdk > 1/ 5 eventtrace > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.3.log > --- end dump of recent events --- > > Best regards > Ugis > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html