OSD crash with "FAILED ceph_assert(v.length() == p->shard_info->bytes)"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



One of OSD(other OSDs are fine) was crashed, and try
"ceph-bluestore-tool fsck" also crashed with same error. Besides destroy
this OSD and re-create, are there any other steps I can do to restore
the OSD?

Below is part of message:

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/os/bluestore/BlueStore.cc: 3228: FAILED ceph_assert(v.length() == p->shard_info->bytes)

 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x562c19f7f73c]
 2: /usr/bin/ceph-osd(+0x57f956) [0x562c19f7f956]
 3: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x7cf) [0x562c1a56d1ef]
 4: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x5dd) [0x562c1a5c5ebd]
 5: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xd1) [0x562c1a5c70e1]
 6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x2077) [0x562c1a5cb237]
 7: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x316) [0x562c1a5e66d6]
 8: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x562c1a22a878]
 9: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xeb0) [0x562c1a41cff0]
 10: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x267) [0x562c1a42d357]
 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x52) [0x562c1a25dd52]
 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5de) [0x562c1a20168e]
 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x562c1a088fc9]
 14: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x562c1a2e7e78]
 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x562c1a0a64c8]
 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x562c1a7232a4]
 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x562c1a726184]
 18: /lib64/libpthread.so.0(+0x81ca) [0x7f2d2a4081ca]
 19: clone()

   -10> 2023-01-10T09:28:02.143+0000 7f2cff3de700 -1 *** Caught signal
(Aborted) **


And this is "meta" file of crash log:

{
    "crash_id": "2023-01-10T09:28:02.137396Z_a504670d-32c3-46ee-8398-84389c9c2d95",
    "timestamp": "2023-01-10T09:28:02.137396Z",
    "process_name": "ceph-osd",
    "entity_name": "osd.3",
    "ceph_version": "16.2.10",
    "utsname_hostname": "dskm1-r0",
    "utsname_sysname": "Linux",
    "utsname_release": "5.18.19",
    "utsname_version": "#1-NixOS SMP PREEMPT_DYNAMIC Sun Aug 21 13:18:56 UTC 2022",
    "utsname_machine": "x86_64",
    "os_name": "CentOS Stream",
    "os_id": "centos",
    "os_version_id": "8",
    "os_version": "8",
    "assert_condition": "v.length() == p->shard_info->bytes",
    "assert_func": "void BlueStore::ExtentMap::fault_range(KeyValueDB*, uint32_t, uint32_t)",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/os/bluestore/BlueStore.cc",
    "assert_line": 3228,
    "assert_thread_name": "tp_osd_tp",
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::ExtentMap::fault_range(KeyValueDB*, uint32_t, uint32_t)' thread 7f2cff3de700 time 2023-01-10T09:28:02.016735+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/os/bluestore/BlueStore.cc: 3228: FAILED ceph_assert(v.length() == p->shard_info->bytes)\n",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7f2d2a412cf0]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x562c19f7f78d]",
        "/usr/bin/ceph-osd(+0x57f956) [0x562c19f7f956]",
        "(BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x7cf) [0x562c1a56d1ef]",
        "(BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x5dd) [0x562c1a5c5ebd]",
        "(BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xd1) [0x562c1a5c70e1]",
        "(BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x2077) [0x562c1a5cb237]",
        "(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x316) [0x562c1a5e66d6]",
        "(non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x562c1a22a878]",
        "(ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xeb0) [0x562c1a41cff0]",
        "(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x267) [0x562c1a42d357]",
        "(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x52) [0x562c1a25dd52]",
        "(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5de) [0x562c1a20168e]",
        "(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x562c1a088fc9]",
        "(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x562c1a2e7e78]",
        "(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x562c1a0a64c8]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x562c1a7232a4]",
        "(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x562c1a726184]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7f2d2a4081ca]",
        "clone()"
    ]
}


I also update full crash log to github gist: https://gist.github.com/yuchangyuan/2016f259175940f64e2eed528d633794

-- 
Best wishes ~
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux