Slow/blocked reads and writes

Fábio Sato <fabiosato@xxxxxxxxx> · Wed, 24 Apr 2024 16:31:20 -0300

Hello all,

I am trying to troubleshoot a ceph cluster version 18.2.2 having users
reporting slow and blocked reads and writes.

When running "ceph status" I am seeing many warnings about its health state:

cluster:
    id:     cc881230-e0dd-11ee-aa9e-37c4e4e5e14b
    health: HEALTH_WARN
            6 clients failing to respond to capability release
            2 clients failing to advance oldest client/flush tid
            1 MDSs report slow requests
            1 MDSs behind on trimming
            Too many repaired reads on 11 OSDs
            Degraded data redundancy: 2 pgs degraded
            105 pgs not deep-scrubbed in time
            109 pgs not scrubbed in time
            1 mgr modules have recently crashed
            12 slow ops, oldest one blocked for 97678 sec, daemons
[osd.11,osd.12,osd.15,osd.16,osd.19,osd.20,osd.28,osd.3,osd.32,osd.34]...
have slow ops.

  services:
    mon: 3 daemons, quorum file03-xx,file04-xx,file05-xx (age 17h)
    mgr: file03-xx.xxxxxx(active, since 2w), standbys: file04-xx.xxxxxx
    mds: 1/1 daemons up, 1 standby
    osd: 44 osds: 44 up (since 17h), 44 in (since 39h); 492 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 2065 pgs
    objects: 66.44M objects, 140 TiB
    usage:   281 TiB used, 304 TiB / 586 TiB avail
    pgs:     16511162/134215883 objects misplaced (12.302%)
             1508 active+clean
             487  active+remapped+backfill_wait
             53   active+clean+scrubbing+deep
             8    active+clean+scrubbing
             5    active+remapped+backfilling
             2    active+recovering+degraded+repair
             2    active+recovering+repair

  io:
    recovery: 47 MiB/s, 37 objects/s

When checking the output of `ceph -w` I am flooded with crc error messages
like the examples below:

2024-04-24T19:15:40.430334+0000 osd.32 [ERR] 3.566 full-object read crc
0xa5da7fa != expected 0xffffffff on 3:66a8d8f5:::10001c72400.00000007:head
2024-04-24T19:15:40.430507+0000 osd.39 [ERR] 3.270 full-object read crc
0xa1bc3a1e != expected 0xffffffff on 3:0e44aa2f:::1000265a625.00000003:head
2024-04-24T19:15:40.494249+0000 osd.28 [ERR] 3.469 full-object read crc
0x8e757c06 != expected 0xffffffff on 3:962852f4:::10001c72c2d.00000001:head
2024-04-24T19:15:40.529771+0000 osd.32 [ERR] 3.566 full-object read crc
0xa5da7fa != expected 0xffffffff on 3:66a8d8f5:::10001c72400.00000007:head
2024-04-24T19:15:40.582128+0000 osd.19 [ERR] 3.4b full-object read crc
0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head
2024-04-24T19:15:40.583350+0000 osd.19 [ERR] 3.4b full-object read crc
0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head
2024-04-24T19:15:40.662945+0000 osd.28 [ERR] 3.469 full-object read crc
0x8e757c06 != expected 0xffffffff on 3:962852f4:::10001c72c2d.00000001:head
2024-04-24T19:15:40.698197+0000 osd.19 [ERR] 3.4b full-object read crc
0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head
2024-04-24T19:15:40.699389+0000 osd.19 [ERR] 3.4b full-object read crc
0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head
2024-04-24T19:15:40.769191+0000 osd.28 [ERR] 3.469 full-object read crc
0x8e757c06 != expected 0xffffffff on 3:962852f4:::10001c72c2d.00000001:head
2024-04-24T19:15:40.834344+0000 osd.19 [ERR] 3.4b full-object read crc
0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head
2024-04-24T19:15:40.835513+0000 osd.19 [ERR] 3.4b full-object read crc
0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head

I suspect this is the main issue affecting the cluster health state and
performance so I am trying to address this first.

The "expected 0xffffffff" crc seems like a bug to me and I found an open
ticket (https://tracker.ceph.com/issues/53240) with similar error messages
but I am not sure this is related to my case.

Could someone point me to the steps to solve these errors?

Cheers,
-- 
Fabio
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx