Hello I've attempted to increase the number of placement groups of the pools in our test cluster and now ceph status (below) is reporting problems. I am not sure what is going on or how to fix this. Troubleshooting scenarios in the docs don't seem to quite match what I am seeing. I have no idea how to begin to debug this. I see OSDs listed in "blocked_by" of pg dump, but don't know how to interpret that. Could somebody assist please? I attached output of "ceph pg dump_stuck -f json-pretty" just in case. The cluster consists of 5 hosts, each with 16 HDDs and 4 SSDs. I am running 13.2.2. This is the affected pool: pool 6 'fs-data-ec-ssd' erasure size 5 min_size 4 crush_rule 6 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 2493 lfor 0/2491 flags hashpspool,ec_overwrites stripe_width 12288 application cephfs Thanks, Vlad ceph health cluster: id: 47caa1df-42be-444d-b603-02cad2a7fdd3 health: HEALTH_WARN Reduced data availability: 155 pgs inactive, 47 pgs peering, 64 pgs stale Degraded data redundancy: 321039/114913606 objects degraded (0.279%), 108 pgs degraded, 108 pgs undersized services: mon: 5 daemons, quorum ceph-1,ceph-2,ceph-3,ceph-4,ceph-5 mgr: ceph-3(active), standbys: ceph-2, ceph-5, ceph-1, ceph-4 mds: cephfs-1/1/1 up {0=ceph-5=up:active}, 4 up:standby osd: 100 osds: 100 up, 100 in; 165 remapped pgs data: pools: 6 pools, 5120 pgs objects: 22.98 M objects, 88 TiB usage: 154 TiB used, 574 TiB / 727 TiB avail pgs: 3.027% pgs not active 321039/114913606 objects degraded (0.279%) 4903 active+clean 105 activating+undersized+degraded+remapped 61 stale+active+clean 47 remapped+peering 3 stale+activating+undersized+degraded+remapped 1 active+clean+scrubbing+deep
Attachment:
stuck.json.gz
Description: application/gzip
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com