Re: bug in luminous bluestore?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 8 Oct 2017, Ugis wrote:
> ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
> I'm in process of migrating OSDs to bluestore. Several migrations
> successful but last one failed. OSD has gone down after some time
> backfilling, found some errors in log, tried to start it again, but
> same errors appeared and OSD went down. After completing this
> bugreport I started OSD again out of curiosity and now it works ok.
> During this time I have started another bluestore OSD for backfilling.
> Nevertheless errors logged below.
> 
> This seems rather like a bug in OSD not HDD problem. I do not see
> SMART problems or any issues related to HDD in dmesg for this OSD.
> 
> The log:
>     -3> 2017-10-08 16:56:31.861941 7f7481b4f700 -1
> bluestore(/var/lib/ceph/osd/ceph-3) _txc_add_transaction error (7)
> Argument list too long not handled on operation 12 (op 2, counting
> from 0)
>     -2> 2017-10-08 16:56:31.861960 7f7481b4f700 -1
> bluestore(/var/lib/ceph/osd/ceph-3) unexpected error code
>     -1> 2017-10-08 16:56:31.861962 7f7481b4f700  0
> bluestore(/var/lib/ceph/osd/ceph-3)  transaction dump:
> {
>     "ops": [
>         {
>             "op_num": 0,
>             "op_name": "remove",
>             "collection": "53.67_head",
>             "oid":
> "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#"
>         },
>         {
>             "op_num": 1,
>             "op_name": "touch",
>             "collection": "53.67_head",
>             "oid":
> "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#"
>         },
>         {
>             "op_num": 2,
>             "op_name": "truncate",
>             "collection": "53.67_head",
>             "oid":
> "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#",
>             "offset": 15560314374

That's a 15GB object!  What is this pool use for?  Is there really an 
object that big, or is there a bug?

Bluestore has a hard limit of ~4gb for objects.  That is about 3 orders of 
magnitude higher than what we normally store so it not expected to be a 
problem...

sage



>         },
>         {
>             "op_num": 3,
>             "op_name": "op_setallochint",
>             "collection": "53.67_head",
>             "oid":
> "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#",
>             "expected_object_size": "0",
>             "expected_write_size": "0"
>         },
>         {
>             "op_num": 4,
>             "op_name": "write",
>             "collection": "53.67_head",
>             "oid":
> "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#",
>             "length": 8388608,
>             "offset": 0,
>             "bufferlist length": 8388608
>         },
>         {
>             "op_num": 5,
>             "op_name": "setattrs",
>             "collection": "53.67_head",
>             "oid":
> "#-55:e6289795:::temp_recovering_53.67_273646'437437_280059_head:head#",
>             "attr_lens": {
>                 "_": 286,
>                 "snapset": 31
>             }
>         }
>     ]
> }
> 
>      0> 2017-10-08 16:56:31.871938 7f7481b4f700 -1
> /build/ceph-12.2.1/src/os/bluestore/BlueStore.cc: In function 'void
> BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> ObjectStore::Transaction*)' thread 7f7481b4f700 time 2017-10-08
> 16:56:31.862077
> /build/ceph-12.2.1/src/os/bluestore/BlueStore.cc: 9299: FAILED
> assert(0 == "unexpected error")
> 
>  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x55fc757a53f2]
>  2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> ObjectStore::Transaction*)+0x1644) [0x55fc7565f404]
>  3: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> std::vector<ObjectStore::Transaction,
> std::allocator<ObjectStore::Transaction> >&,
> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x52e)
> [0x55fc75660a7e]
>  4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
> ObjectStore::Transaction&&, Context*, Context*, Context*,
> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x17c)
> [0x55fc7520dd8c]
>  5: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&,
> boost::intrusive_ptr<OpRequest>)+0x58) [0x55fc75399598]
>  6: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x391)
> [0x55fc754a8a41]
>  7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1)
> [0x55fc754b7a21]
>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> [0x55fc753c7aa0]
>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x55d) [0x55fc7532c62d]
>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9)
> [0x55fc751af1a9]
>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> const&)+0x57) [0x55fc75448ae7]
>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x130e) [0x55fc751d67de]
>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884)
> [0x55fc757aa1e4]
>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fc757ad220]
>  15: (()+0x76ba) [0x7f749b6d36ba]
>  16: (clone()+0x6d) [0x7f749a74a82d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> 
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    1/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    4/ 5 memdb
>    1/ 5 kinetic
>    1/ 5 fuse
>    1/ 5 mgr
>    1/ 5 mgrc
>    1/ 5 dpdk
>    1/ 5 eventtrace
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.3.log
> --- end dump of recent events ---
> 2017-10-08 16:56:31.909899 7f7481b4f700 -1 *** Caught signal (Aborted) **
>  in thread 7f7481b4f700 thread_name:tp_osd_tp
> 
>  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)
>  1: (()+0xa5a634) [0x55fc75762634]
>  2: (()+0x11390) [0x7f749b6dd390]
>  3: (gsignal()+0x38) [0x7f749a679428]
>  4: (abort()+0x16a) [0x7f749a67b02a]
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x28e) [0x55fc757a557e]
>  6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> ObjectStore::Transaction*)+0x1644) [0x55fc7565f404]
>  7: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> std::vector<ObjectStore::Transaction,
> std::allocator<ObjectStore::Transaction> >&,
> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x52e)
> [0x55fc75660a7e]
>  8: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
> ObjectStore::Transaction&&, Context*, Context*, Context*,
> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x17c)
> [0x55fc7520dd8c]
>  9: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&,
> boost::intrusive_ptr<OpRequest>)+0x58) [0x55fc75399598]
>  10: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x391)
> [0x55fc754a8a41]
>  11: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1)
> [0x55fc754b7a21]
>  12: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> [0x55fc753c7aa0]
>  13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x55d) [0x55fc7532c62d]
>  14: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9)
> [0x55fc751af1a9]
>  15: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> const&)+0x57) [0x55fc75448ae7]
>  16: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x130e) [0x55fc751d67de]
>  17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884)
> [0x55fc757aa1e4]
>  18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fc757ad220]
>  19: (()+0x76ba) [0x7f749b6d36ba]
>  20: (clone()+0x6d) [0x7f749a74a82d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> 
> --- begin dump of recent events ---
>      0> 2017-10-08 16:56:31.909899 7f7481b4f700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f7481b4f700 thread_name:tp_osd_tp
> 
>  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)
>  1: (()+0xa5a634) [0x55fc75762634]
>  2: (()+0x11390) [0x7f749b6dd390]
>  3: (gsignal()+0x38) [0x7f749a679428]
>  4: (abort()+0x16a) [0x7f749a67b02a]
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x28e) [0x55fc757a557e]
>  6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> ObjectStore::Transaction*)+0x1644) [0x55fc7565f404]
>  7: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> std::vector<ObjectStore::Transaction,
> std::allocator<ObjectStore::Transaction> >&,
> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x52e)
> [0x55fc75660a7e]
>  8: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
> ObjectStore::Transaction&&, Context*, Context*, Context*,
> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x17c)
> [0x55fc7520dd8c]
>  9: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&,
> boost::intrusive_ptr<OpRequest>)+0x58) [0x55fc75399598]
>  10: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x391)
> [0x55fc754a8a41]
>  11: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1)
> [0x55fc754b7a21]
>  12: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> [0x55fc753c7aa0]
>  13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x55d) [0x55fc7532c62d]
>  14: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9)
> [0x55fc751af1a9]
>  15: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> const&)+0x57) [0x55fc75448ae7]
>  16: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x130e) [0x55fc751d67de]
>  17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884)
> [0x55fc757aa1e4]
>  18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fc757ad220]
>  19: (()+0x76ba) [0x7f749b6d36ba]
>  20: (clone()+0x6d) [0x7f749a74a82d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> 
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    1/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    4/ 5 memdb
>    1/ 5 kinetic
>    1/ 5 fuse
>    1/ 5 mgr
>    1/ 5 mgrc
>    1/ 5 dpdk
>    1/ 5 eventtrace
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.3.log
> --- end dump of recent events ---
> 
> Best regards
> Ugis
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux