OSD: failed decoding part header ERRORS

Mark Selby <mselby@xxxxxxxxxx> · Wed, 3 Apr 2024 00:34:24 +0000

We have a ceph cluster of only nvme drives.

Very recently our overall OSD write latency increase pretty dramatically and our overall thoughput has really decreased.

One thing that seems to correlate with the start of this problem are the below ERROR line from the logs. All our OSD nodes are creating these log lines now.

Can anyone tell me what this might be telling us? All and any help is greatly appreciated.

Mar 31 23:21:56 ceph1d03 ceph-8797e570-96be-11ed-b022-506b4b7d76e1-osd-46[12898]: debug 2024-04-01T03:21:56.953+0000 7effbba51700  0 <cls> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/cls/fifo/cls_fifo.cc:112: ERROR: int rados::cls::fifo::{anonymous}::read_part_header(cls_method_context_t, rados::cls::fifo::part_header*): failed decoding part header

-- 
Mark Selby
Sr Linux Administrator, The Voleon Group
mselby@xxxxxxxxxx 

 This email is subject to important conditions and disclosures that are listed on this web page: https://voleon.com/disclaimer/.

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx