Ceph disks fill up to 100%

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I don't have a ton of experience troubleshooting ceph issues and running into an issue where the OSD's are filling up to 100% and then getting downed in the cluster. My overall ceph status is in HEALTH_WARN right now and I'm not sure exactly where I should be looking:

user@ceph01:~$ sudo ceph status
  cluster:
    id:     7138ab3f-ee99-42ac-9f3d-42c460f0b6a1
    health: HEALTH_WARN
            1 osds down
            Reduced data availability: 4 pgs inactive
            Degraded data redundancy: 9889/39939 objects degraded (24.760%), 92 pgs degraded, 125 pgs undersized
            129 pgs not deep-scrubbed in time
            129 pgs not scrubbed in time
            4 daemons have recently crashed

  services:
    mon: 3 daemons, quorum a,b,c (age 2w)
    mgr: b(active, since 2w), standbys: c, a
    osd: 16 osds: 10 up (since 18h), 11 in (since 7m); 40 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 13.31k objects, 51 GiB
    usage:   113 GiB used, 887 GiB / 1000 GiB avail
    pgs:     3.101% pgs not active
             9889/39939 objects degraded (24.760%)
             7398/39939 objects misplaced (18.523%)
             88 active+undersized+degraded
             36 active+undersized+remapped
             2 undersized+degraded+remapped+backfill_wait+peered
             2  undersized+degraded+remapped+backfilling+peered
             1  active+undersized

  io:
    client:   116 KiB/s rd, 981 KiB/s wr, 76 op/s rd, 64 op/s wr
    recovery: 22 MiB/s, 5 objects/s

Each of the "down" OSDs are showing 100% disk utilization, plus the other drives disk utilization that are still up are getting dangerously high:

user@ceph01:~$ sudo ceph df
--- RAW STORAGE ---
CLASS  SIZE      AVAIL    USED     RAW USED  %RAW USED
hdd    1000 GiB  886 GiB  104 GiB   114 GiB      11.38
TOTAL  1000 GiB  886 GiB  104 GiB   114 GiB      11.38

--- POOLS ---
POOL                   ID  PGS  STORED  OBJECTS  USED     %USED MAX AVAIL
device_health_metrics   1    1     0 B        0      0 B 0    425 GiB
cloudstack              3  128  66 GiB   13.32k  151 GiB 10.57    560 GiB

user@ceph01:~$ sudo ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME                STATUS  REWEIGHT PRI-AFF
-1         1.56299  root default
-3         0.39075      host bllcloudceph01
 1    hdd  0.09769          osd.1              down         0 1.00000
 2    hdd  0.09769          osd.2              down         0 1.00000
 3    hdd  0.09769          osd.3              down         0 1.00000
 4    hdd  0.09769          osd.4              down         0 1.00000
-5         0.58612      host bllcloudceph02
 5    hdd  0.09769          osd.5                up   1.00000 1.00000
 6    hdd  0.09769          osd.6                up   1.00000 1.00000
 7    hdd  0.09769          osd.7                up   1.00000 1.00000
 8    hdd  0.09769          osd.8              down         0 1.00000
 9    hdd  0.09769          osd.9                up   1.00000 1.00000
10    hdd  0.09769          osd.10               up   1.00000 1.00000
-7         0.58612      host bllcloudceph03
11    hdd  0.09769          osd.11               up   1.00000 1.00000
12    hdd  0.09769          osd.12               up   1.00000 1.00000
13    hdd  0.09769          osd.13               up   1.00000 1.00000
14    hdd  0.09769          osd.14               up   1.00000 1.00000
15    hdd  0.09769          osd.15               up   1.00000 1.00000
16    hdd  0.09769          osd.16             down   1.00000 1.00000

jschaeffer@bllcloudceph01:~$ sudo df -h | grep ceph
/dev/sdc                                           69G   69G 16K 100% /var/lib/ceph/osd/ceph-0
/dev/sde                                           69G   69G 16K 100% /var/lib/ceph/osd/ceph-2
/dev/sdf                                           69G   69G 16K 100% /var/lib/ceph/osd/ceph-3
/dev/sdh                                           69G   69G 20K 100% /var/lib/ceph/osd/ceph-4
/dev/sdd                                           69G   69G 16K 100% /var/lib/ceph/osd/ceph-1

The amount of data the actual cluster is using shouldn't be that much (less then 200GB).

When I try to manually start one of these OSD's it throws a lot of errors:

user@ceph01:~$ sudo ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
2022-08-20T19:20:43.402+0000 7f5c514c8d80 -1 Falling back to public interface
2022-08-20T19:20:45.066+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
2022-08-20T19:20:46.742+0000 7f5c514c8d80 -1 osd.1 9319 log_to_monitors {default=true}
2022-08-20T19:20:47.806+0000 7f5c44d71700 -1 osd.1 9319 failed to load OSD map for epoch 9849, got 0 bytes
2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
/build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f5c44d71700 time 2022-08-20T19:20:47.822988+0000
/build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: 13571: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)
 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x5579a7c3685b]
 2: (()+0x9e8a63) [0x5579a7c36a63]
 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f]
 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e]
 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1]
 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0]
 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e]
 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed]
 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67]
 10: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972]
 11: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb]
 12: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104]
 13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139]
 14: (DispatchQueue::entry()+0x58f) [0x5579a8757def]
 15: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1]
 16: (()+0x9609) [0x7f5c51b66609]
 17: (clone()+0x43) [0x7f5c516d2293]
*** Caught signal (Aborted) **
 in thread 7f5c44d71700 thread_name:ms_dispatch
2022-08-20T19:20:47.854+0000 7f5c44d71700 -1 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f5c44d71700 time 2022-08-20T19:20:47.822988+0000
/build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: 13571: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)

 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x5579a7c3685b]
 2: (()+0x9e8a63) [0x5579a7c36a63]
 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f]
 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e]
 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1]
 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0]
 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e]
 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed]
 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67]
 10: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972]
 11: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb]
 12: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104]
 13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139]
 14: (DispatchQueue::entry()+0x58f) [0x5579a8757def]
 15: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1]
 16: (()+0x9609) [0x7f5c51b66609]
 17: (clone()+0x43) [0x7f5c516d2293]

 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
 1: (()+0x153c0) [0x7f5c51b723c0]
 2: (gsignal()+0xcb) [0x7f5c515f618b]
 3: (abort()+0x12b) [0x7f5c515d5859]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x5579a7c368b6]
 5: (()+0x9e8a63) [0x5579a7c36a63]
 6: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f]
 7: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e]
 8: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1]
 9: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0]
 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e]
 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed]
 12: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67]
 13: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972]
 14: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb]
 15: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104]
 16: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139]
 17: (DispatchQueue::entry()+0x58f) [0x5579a8757def]
 18: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1]
 19: (()+0x9609) [0x7f5c51b66609]
 20: (clone()+0x43) [0x7f5c516d2293]
2022-08-20T19:20:47.894+0000 7f5c44d71700 -1 *** Caught signal (Aborted) **
 in thread 7f5c44d71700 thread_name:ms_dispatch

 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
 1: (()+0x153c0) [0x7f5c51b723c0]
 2: (gsignal()+0xcb) [0x7f5c515f618b]
 3: (abort()+0x12b) [0x7f5c515d5859]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x5579a7c368b6]
 5: (()+0x9e8a63) [0x5579a7c36a63]
 6: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f]
 7: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e]
 8: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1]
 9: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0]
 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e]
 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed]
 12: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67]
 13: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972]
 14: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb]
 15: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104]
 16: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139]
 17: (DispatchQueue::entry()+0x58f) [0x5579a8757def]
 18: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1]
 19: (()+0x9609) [0x7f5c51b66609]
 20: (clone()+0x43) [0x7f5c516d2293]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

  -964> 2022-08-20T19:20:43.402+0000 7f5c514c8d80 -1 Falling back to public interface
  -963> 2022-08-20T19:20:45.066+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
  -962> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
  -961> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
  -960> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
  -959> 2022-08-20T19:20:46.742+0000 7f5c514c8d80 -1 osd.1 9319 log_to_monitors {default=true}
  -958> 2022-08-20T19:20:47.806+0000 7f5c44d71700 -1 osd.1 9319 failed to load OSD map for epoch 9849, got 0 bytes
  -957> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
  -956> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
  -955> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
  -954> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
  -953> 2022-08-20T19:20:47.854+0000 7f5c44d71700 -1 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f5c44d71700 time 2022-08-20T19:20:47.822988+0000
/build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: 13571: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)

 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x5579a7c3685b]
 2: (()+0x9e8a63) [0x5579a7c36a63]
 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f]
 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e]
 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1]
 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0]
 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e]
 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed]
 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67]
 10: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972]
 11: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb]
 12: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104]
 13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139]
 14: (DispatchQueue::entry()+0x58f) [0x5579a8757def]
 15: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1]
 16: (()+0x9609) [0x7f5c51b66609]
 17: (clone()+0x43) [0x7f5c516d2293]

  -952> 2022-08-20T19:20:47.894+0000 7f5c44d71700 -1 *** Caught signal (Aborted) **
 in thread 7f5c44d71700 thread_name:ms_dispatch

 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
 1: (()+0x153c0) [0x7f5c51b723c0]
 2: (gsignal()+0xcb) [0x7f5c515f618b]
 3: (abort()+0x12b) [0x7f5c515d5859]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x5579a7c368b6]
 5: (()+0x9e8a63) [0x5579a7c36a63]
 6: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f]
 7: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e]
 8: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1]
 9: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0]
 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e]
 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed]
 12: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67]
 13: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972]
 14: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb]
 15: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104]
 16: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139]
 17: (DispatchQueue::entry()+0x58f) [0x5579a8757def]
 18: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1]
 19: (()+0x9609) [0x7f5c51b66609]
 20: (clone()+0x43) [0x7f5c516d2293]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

  -685> 2022-08-20T19:20:43.402+0000 7f5c514c8d80 -1 Falling back to public interface
  -205> 2022-08-20T19:20:45.066+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
  -204> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
  -203> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
  -202> 2022-08-20T19:20:45.070+0000 7f5c514c8d80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x0~1000, object #-1:46e858f0:::pg_num_history:head#
  -142> 2022-08-20T19:20:46.742+0000 7f5c514c8d80 -1 osd.1 9319 log_to_monitors {default=true}
    -6> 2022-08-20T19:20:47.806+0000 7f5c44d71700 -1 osd.1 9319 failed to load OSD map for epoch 9849, got 0 bytes
    -5> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
    -4> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
    -3> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
    -2> 2022-08-20T19:20:47.818+0000 7f5c44d71700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xa106a010, expected 0xbea1b1f4, device location [0x180000~1000], logical extent 0x12~1000, object #-1:46e858f0:::pg_num_history:head#
    -1> 2022-08-20T19:20:47.854+0000 7f5c44d71700 -1 /build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)' thread 7f5c44d71700 time 2022-08-20T19:20:47.822988+0000
/build/ceph-15.2.13/src/os/bluestore/BlueStore.cc: 13571: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)

 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x5579a7c3685b]
 2: (()+0x9e8a63) [0x5579a7c36a63]
 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f]
 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e]
 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1]
 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0]
 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e]
 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed]
 9: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67]
 10: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972]
 11: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb]
 12: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104]
 13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139]
 14: (DispatchQueue::entry()+0x58f) [0x5579a8757def]
 15: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1]
 16: (()+0x9609) [0x7f5c51b66609]
 17: (clone()+0x43) [0x7f5c516d2293]

     0> 2022-08-20T19:20:47.894+0000 7f5c44d71700 -1 *** Caught signal (Aborted) **
 in thread 7f5c44d71700 thread_name:ms_dispatch

 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
 1: (()+0x153c0) [0x7f5c51b723c0]
 2: (gsignal()+0xcb) [0x7f5c515f618b]
 3: (abort()+0x12b) [0x7f5c515d5859]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x5579a7c368b6]
 5: (()+0x9e8a63) [0x5579a7c36a63]
 6: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x3c0f) [0x5579a826f70f]
 7: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x23e) [0x5579a827006e]
 8: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2d1) [0x5579a82781f1]
 9: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xe0) [0x5579a82792a0]
 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x199e) [0x5579a827cc1e]
 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ad) [0x5579a827deed]
 12: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x5579a7d5cb67]
 13: (OSD::handle_osd_map(MOSDMap*)+0x2472) [0x5579a7cf9972]
 14: (OSD::_dispatch(Message*)+0x18b) [0x5579a7d1bdbb]
 15: (OSD::ms_dispatch(Message*)+0x84) [0x5579a7d1c104]
 16: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x5579a8759139]
 17: (DispatchQueue::entry()+0x58f) [0x5579a8757def]
 18: (DispatchQueue::DispatchThread::entry()+0x11) [0x5579a85861f1]
 19: (()+0x9609) [0x7f5c51b66609]
 20: (clone()+0x43) [0x7f5c516d2293]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Any help would be appreciated. The drives themselves look to be healthy, as far as I can tell, and it would seem odd to me that all of them fail at the exact same time. If I need to replace the drives how do I remove them from the cluster without losing data. There should be enough space between the remaining drives as long as the utilized space doesn't keep consistenly climbing.

--
Thanks,
Joshua Schaeffer
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux