We have a ceph cluster of only nvme drives. Very recently our overall OSD write latency increase pretty dramatically and our overall thoughput has really decreased. One thing that seems to correlate with the start of this problem are the below ERROR line from the logs. All our OSD nodes are creating these log lines now. Can anyone tell me what this might be telling us? All and any help is greatly appreciated. Mar 31 23:21:56 ceph1d03 ceph-8797e570-96be-11ed-b022-506b4b7d76e1-osd-46[12898]: debug 2024-04-01T03:21:56.953+0000 7effbba51700 0 <cls> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/cls/fifo/cls_fifo.cc:112: ERROR: int rados::cls::fifo::{anonymous}::read_part_header(cls_method_context_t, rados::cls::fifo::part_header*): failed decoding part header -- Mark Selby Sr Linux Administrator, The Voleon Group mselby@xxxxxxxxxx This email is subject to important conditions and disclosures that are listed on this web page: https://voleon.com/disclaimer/. |
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx