Hello all, I am trying to troubleshoot a ceph cluster version 18.2.2 having users reporting slow and blocked reads and writes. When running "ceph status" I am seeing many warnings about its health state: cluster: id: cc881230-e0dd-11ee-aa9e-37c4e4e5e14b health: HEALTH_WARN 6 clients failing to respond to capability release 2 clients failing to advance oldest client/flush tid 1 MDSs report slow requests 1 MDSs behind on trimming Too many repaired reads on 11 OSDs Degraded data redundancy: 2 pgs degraded 105 pgs not deep-scrubbed in time 109 pgs not scrubbed in time 1 mgr modules have recently crashed 12 slow ops, oldest one blocked for 97678 sec, daemons [osd.11,osd.12,osd.15,osd.16,osd.19,osd.20,osd.28,osd.3,osd.32,osd.34]... have slow ops. services: mon: 3 daemons, quorum file03-xx,file04-xx,file05-xx (age 17h) mgr: file03-xx.xxxxxx(active, since 2w), standbys: file04-xx.xxxxxx mds: 1/1 daemons up, 1 standby osd: 44 osds: 44 up (since 17h), 44 in (since 39h); 492 remapped pgs data: volumes: 1/1 healthy pools: 3 pools, 2065 pgs objects: 66.44M objects, 140 TiB usage: 281 TiB used, 304 TiB / 586 TiB avail pgs: 16511162/134215883 objects misplaced (12.302%) 1508 active+clean 487 active+remapped+backfill_wait 53 active+clean+scrubbing+deep 8 active+clean+scrubbing 5 active+remapped+backfilling 2 active+recovering+degraded+repair 2 active+recovering+repair io: recovery: 47 MiB/s, 37 objects/s When checking the output of `ceph -w` I am flooded with crc error messages like the examples below: 2024-04-24T19:15:40.430334+0000 osd.32 [ERR] 3.566 full-object read crc 0xa5da7fa != expected 0xffffffff on 3:66a8d8f5:::10001c72400.00000007:head 2024-04-24T19:15:40.430507+0000 osd.39 [ERR] 3.270 full-object read crc 0xa1bc3a1e != expected 0xffffffff on 3:0e44aa2f:::1000265a625.00000003:head 2024-04-24T19:15:40.494249+0000 osd.28 [ERR] 3.469 full-object read crc 0x8e757c06 != expected 0xffffffff on 3:962852f4:::10001c72c2d.00000001:head 2024-04-24T19:15:40.529771+0000 osd.32 [ERR] 3.566 full-object read crc 0xa5da7fa != expected 0xffffffff on 3:66a8d8f5:::10001c72400.00000007:head 2024-04-24T19:15:40.582128+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.583350+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.662945+0000 osd.28 [ERR] 3.469 full-object read crc 0x8e757c06 != expected 0xffffffff on 3:962852f4:::10001c72c2d.00000001:head 2024-04-24T19:15:40.698197+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.699389+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.769191+0000 osd.28 [ERR] 3.469 full-object read crc 0x8e757c06 != expected 0xffffffff on 3:962852f4:::10001c72c2d.00000001:head 2024-04-24T19:15:40.834344+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.835513+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head I suspect this is the main issue affecting the cluster health state and performance so I am trying to address this first. The "expected 0xffffffff" crc seems like a bug to me and I found an open ticket (https://tracker.ceph.com/issues/53240) with similar error messages but I am not sure this is related to my case. Could someone point me to the steps to solve these errors? Cheers, -- Fabio _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx