If you can get them back online, you could then reweight each one manually until you see a good balance, then turn on auto-balancing. An entire host being down could be a problem though. Also, you didn’t mention the highest size(replication) you have active. -Brent -----Original Message----- From: Joshua Schaeffer <jschaeffer@xxxxxxxxxxxxxxx> Sent: Saturday, August 20, 2022 3:31 PM To: ceph-users@xxxxxxx Subject: Ceph disks fill up to 100% I don't have a ton of experience troubleshooting ceph issues and running into an issue where the OSD's are filling up to 100% and then getting downed in the cluster. My overall ceph status is in HEALTH_WARN right now and I'm not sure exactly where I should be looking: user@ceph01:~$ sudo ceph status cluster: id: 7138ab3f-ee99-42ac-9f3d-42c460f0b6a1 health: HEALTH_WARN 1 osds down Reduced data availability: 4 pgs inactive Degraded data redundancy: 9889/39939 objects degraded (24.760%), 92 pgs degraded, 125 pgs undersized 129 pgs not deep-scrubbed in time 129 pgs not scrubbed in time 4 daemons have recently crashed services: mon: 3 daemons, quorum a,b,c (age 2w) mgr: b(active, since 2w), standbys: c, a osd: 16 osds: 10 up (since 18h), 11 in (since 7m); 40 remapped pgs data: pools: 2 pools, 129 pgs objects: 13.31k objects, 51 GiB usage: 113 GiB used, 887 GiB / 1000 GiB avail pgs: 3.101% pgs not active 9889/39939 objects degraded (24.760%) 7398/39939 objects misplaced (18.523%) 88 active+undersized+degraded 36 active+undersized+remapped 2 undersized+degraded+remapped+backfill_wait+peered 2 undersized+degraded+remapped+backfilling+peered 1 active+undersized io: client: 116 KiB/s rd, 981 KiB/s wr, 76 op/s rd, 64 op/s wr recovery: 22 MiB/s, 5 objects/s Each of the "down" OSDs are showing 100% disk utilization, plus the other drives disk utilization that are still up are getting dangerously high: user@ceph01:~$ sudo ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 1000 GiB 886 GiB 104 GiB 114 GiB 11.38 TOTAL 1000 GiB 886 GiB 104 GiB 114 GiB 11.38 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 425 GiB cloudstack 3 128 66 GiB 13.32k 151 GiB 10.57 560 GiB user@ceph01:~$ sudo ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.56299 root default -3 0.39075 host bllcloudceph01 1 hdd 0.09769 osd.1 down 0 1.00000 2 hdd 0.09769 osd.2 down 0 1.00000 3 hdd 0.09769 osd.3 down 0 1.00000 4 hdd 0.09769 osd.4 down 0 1.00000 -5 0.58612 host bllcloudceph02 5 hdd 0.09769 osd.5 up 1.00000 1.00000 6 hdd 0.09769 osd.6 up 1.00000 1.00000 7 hdd 0.09769 osd.7 up 1.00000 1.00000 8 hdd 0.09769 osd.8 down 0 1.00000 9 hdd 0.09769 osd.9 up 1.00000 1.00000 10 hdd 0.09769 osd.10 up 1.00000 1.00000 -7 0.58612 host bllcloudceph03 11 hdd 0.09769 osd.11 up 1.00000 1.00000 12 hdd 0.09769 osd.12 up 1.00000 1.00000 13 hdd 0.09769 osd.13 up 1.00000 1.00000 14 hdd 0.09769 osd.14 up 1.00000 1.00000 15 hdd 0.09769 osd.15 up 1.00000 1.00000 16 hdd 0.09769 osd.16 down 1.00000 1.00000 jschaeffer@bllcloudceph01:~$ sudo df -h | grep ceph /dev/sdc 69G 69G 16K 100% /var/lib/ceph/osd/ceph-0 /dev/sde 69G 69G 16K 100% /var/lib/ceph/osd/ceph-2 /dev/sdf 69G 69G 16K 100% /var/lib/ceph/osd/ceph-3 /dev/sdh 69G 69G 20K 100% /var/lib/ceph/osd/ceph-4 /dev/sdd 69G 69G 16K 100% /var/lib/ceph/osd/ceph-1 The amount of data the actual cluster is using shouldn't be that much (less then 200GB). When I try to manually start one of these OSD's it throws a lot of errors: user@ceph01:~$ sudo ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph 2022-08-20T19:20:43.402+0000 7f5c514c8d80 -1 Falling back to public interface 2022-08-20T19:20:45.066+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# 2022-08-20T19:20:46.742+0000 7f5c514c8d80 -1 osd.1 9319 log_to_monitors {default=true} 2022-08-20T19:20:47.806+0000 7f5c44d71700 -1 osd.1 9319 failed to load OSD map for epoch 9849, got 0 bytes 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f5c44d71700 time 2022-08-20T19:20:47.822988+0000 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: 13571: FAILED ceph_assert(r >= 0 && r <= (int)tail_read) ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x5579a7c3685b] 2: (()+0x9e8a63) [0x5579a7c36a63] 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f] 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e] 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1] 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0] 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e] 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed] 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67] 10: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972] 11: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb] 12: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104] 13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139] 14: (DispatchQueue::entry()+0x58f) [0x5579a8757def] 15: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1] 16: (()+0x9609) [0x7f5c51b66609] 17: (clone()+0x43) [0x7f5c516d2293] *** Caught signal (Aborted) ** in thread 7f5c44d71700 thread_name:ms_dispatch 2022-08-20T19:20:47.854+0000 7f5c44d71700 -1 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f5c44d71700 time 2022-08-20T19:20:47.822988+0000 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: 13571: FAILED ceph_assert(r >= 0 && r <= (int)tail_read) ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x5579a7c3685b] 2: (()+0x9e8a63) [0x5579a7c36a63] 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f] 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e] 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1] 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0] 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e] 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed] 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67] 10: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972] 11: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb] 12: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104] 13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139] 14: (DispatchQueue::entry()+0x58f) [0x5579a8757def] 15: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1] 16: (()+0x9609) [0x7f5c51b66609] 17: (clone()+0x43) [0x7f5c516d2293] ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) 1: (()+0x153c0) [0x7f5c51b723c0] 2: (gsignal()+0xcb) [0x7f5c515f618b] 3: (abort()+0x12b) [0x7f5c515d5859] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x5579a7c368b6] 5: (()+0x9e8a63) [0x5579a7c36a63] 6: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f] 7: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e] 8: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1] 9: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0] 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e] 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed] 12: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67] 13: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972] 14: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb] 15: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104] 16: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139] 17: (DispatchQueue::entry()+0x58f) [0x5579a8757def] 18: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1] 19: (()+0x9609) [0x7f5c51b66609] 20: (clone()+0x43) [0x7f5c516d2293] 2022-08-20T19:20:47.894+0000 7f5c44d71700 -1 *** Caught signal (Aborted) ** in thread 7f5c44d71700 thread_name:ms_dispatch ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) 1: (()+0x153c0) [0x7f5c51b723c0] 2: (gsignal()+0xcb) [0x7f5c515f618b] 3: (abort()+0x12b) [0x7f5c515d5859] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x5579a7c368b6] 5: (()+0x9e8a63) [0x5579a7c36a63] 6: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f] 7: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e] 8: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1] 9: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0] 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e] 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed] 12: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67] 13: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972] 14: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb] 15: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104] 16: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139] 17: (DispatchQueue::entry()+0x58f) [0x5579a8757def] 18: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1] 19: (()+0x9609) [0x7f5c51b66609] 20: (clone()+0x43) [0x7f5c516d2293] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -964> 2022-08-20T19:20:43.402+0000 7f5c514c8d80 -1 Falling back to public interface -963> 2022-08-20T19:20:45.066+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# -962> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# -961> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# -960> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# -959> 2022-08-20T19:20:46.742+0000 7f5c514c8d80 -1 osd.1 9319 log_to_monitors {default=true} -958> 2022-08-20T19:20:47.806+0000 7f5c44d71700 -1 osd.1 9319 failed to load OSD map for epoch 9849, got 0 bytes -957> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# -956> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# -955> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# -954> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# -953> 2022-08-20T19:20:47.854+0000 7f5c44d71700 -1 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f5c44d71700 time 2022-08-20T19:20:47.822988+0000 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: 13571: FAILED ceph_assert(r >= 0 && r <= (int)tail_read) ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x5579a7c3685b] 2: (()+0x9e8a63) [0x5579a7c36a63] 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f] 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e] 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1] 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0] 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e] 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed] 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67] 10: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972] 11: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb] 12: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104] 13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139] 14: (DispatchQueue::entry()+0x58f) [0x5579a8757def] 15: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1] 16: (()+0x9609) [0x7f5c51b66609] 17: (clone()+0x43) [0x7f5c516d2293] -952> 2022-08-20T19:20:47.894+0000 7f5c44d71700 -1 *** Caught signal (Aborted) ** in thread 7f5c44d71700 thread_name:ms_dispatch ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) 1: (()+0x153c0) [0x7f5c51b723c0] 2: (gsignal()+0xcb) [0x7f5c515f618b] 3: (abort()+0x12b) [0x7f5c515d5859] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x5579a7c368b6] 5: (()+0x9e8a63) [0x5579a7c36a63] 6: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f] 7: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e] 8: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1] 9: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0] 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e] 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed] 12: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67] 13: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972] 14: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb] 15: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104] 16: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139] 17: (DispatchQueue::entry()+0x58f) [0x5579a8757def] 18: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1] 19: (()+0x9609) [0x7f5c51b66609] 20: (clone()+0x43) [0x7f5c516d2293] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -685> 2022-08-20T19:20:43.402+0000 7f5c514c8d80 -1 Falling back to public interface -205> 2022-08-20T19:20:45.066+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# -204> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# -203> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# -202> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head# -142> 2022-08-20T19:20:46.742+0000 7f5c514c8d80 -1 osd.1 9319 log_to_monitors {default=true} -6> 2022-08-20T19:20:47.806+0000 7f5c44d71700 -1 osd.1 9319 failed to load OSD map for epoch 9849, got 0 bytes -5> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# -4> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# -3> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# -2> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head# -1> 2022-08-20T19:20:47.854+0000 7f5c44d71700 -1 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f5c44d71700 time 2022-08-20T19:20:47.822988+0000 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: 13571: FAILED ceph_assert(r >= 0 && r <= (int)tail_read) ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x5579a7c3685b] 2: (()+0x9e8a63) [0x5579a7c36a63] 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f] 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e] 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1] 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0] 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e] 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed] 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67] 10: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972] 11: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb] 12: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104] 13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139] 14: (DispatchQueue::entry()+0x58f) [0x5579a8757def] 15: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1] 16: (()+0x9609) [0x7f5c51b66609] 17: (clone()+0x43) [0x7f5c516d2293] 0> 2022-08-20T19:20:47.894+0000 7f5c44d71700 -1 *** Caught signal (Aborted) ** in thread 7f5c44d71700 thread_name:ms_dispatch ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) 1: (()+0x153c0) [0x7f5c51b723c0] 2: (gsignal()+0xcb) [0x7f5c515f618b] 3: (abort()+0x12b) [0x7f5c515d5859] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x5579a7c368b6] 5: (()+0x9e8a63) [0x5579a7c36a63] 6: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f] 7: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e] 8: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1] 9: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0] 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e] 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed] 12: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67] 13: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972] 14: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb] 15: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104] 16: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139] 17: (DispatchQueue::entry()+0x58f) [0x5579a8757def] 18: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1] 19: (()+0x9609) [0x7f5c51b66609] 20: (clone()+0x43) [0x7f5c516d2293] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Any help would be appreciated. The drives themselves look to be healthy, as far as I can tell, and it would seem odd to me that all of them fail at the exact same time. If I need to replace the drives how do I remove them from the cluster without losing data. There should be enough space between the remaining drives as long as the utilized space doesn't keep consistenly climbing. -- Thanks, Joshua Schaeffer _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx