Thanks to everybody who responded. The problem was, indeed, that I hit the limit on the number of PGs per SSD OSD when I increased the number of PGs in a pool. One question though: should I have received a warning that some OSDs are close to their maximum PG limit? A while back, in a Luminous test pool I remember seeing something like "too many PGs per OSD" in some of my testing, but not this time (perhaps because this time I hit the limit during the resizing operation). Where might such warning be recorded if not in "ceph status"? Thanks, Vlad On 09/28/2018 01:04 PM, Paul Emmerich wrote: > I guess the pool is mapped to SSDs only from the name and you only got 20 SSDs. > So you should have about ~2000 effective PGs taking replication into account. > > Your pool has ~10k effective PGs with k+m=5 and you seem to have 5 > more pools.... > > Check "ceph osd df tree" to see how many PGs per OSD you got. > > Try increasing these two options to "fix" it. > > mon max pg per osd > osd max pg per osd hard ratio > > > Paul > Am Fr., 28. Sep. 2018 um 18:05 Uhr schrieb Vladimir Brik > <vladimir.brik@xxxxxxxxxxxxxxxx>: >> >> Hello >> >> I've attempted to increase the number of placement groups of the pools >> in our test cluster and now ceph status (below) is reporting problems. I >> am not sure what is going on or how to fix this. Troubleshooting >> scenarios in the docs don't seem to quite match what I am seeing. >> >> I have no idea how to begin to debug this. I see OSDs listed in >> "blocked_by" of pg dump, but don't know how to interpret that. Could >> somebody assist please? >> >> I attached output of "ceph pg dump_stuck -f json-pretty" just in case. >> >> The cluster consists of 5 hosts, each with 16 HDDs and 4 SSDs. I am >> running 13.2.2. >> >> This is the affected pool: >> pool 6 'fs-data-ec-ssd' erasure size 5 min_size 4 crush_rule 6 >> object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 2493 lfor >> 0/2491 flags hashpspool,ec_overwrites stripe_width 12288 application cephfs >> >> >> Thanks, >> >> Vlad >> >> >> ceph health >> >> cluster: >> id: 47caa1df-42be-444d-b603-02cad2a7fdd3 >> health: HEALTH_WARN >> Reduced data availability: 155 pgs inactive, 47 pgs peering, >> 64 pgs stale >> Degraded data redundancy: 321039/114913606 objects degraded >> (0.279%), 108 pgs degraded, 108 pgs undersized >> >> services: >> mon: 5 daemons, quorum ceph-1,ceph-2,ceph-3,ceph-4,ceph-5 >> mgr: ceph-3(active), standbys: ceph-2, ceph-5, ceph-1, ceph-4 >> mds: cephfs-1/1/1 up {0=ceph-5=up:active}, 4 up:standby >> osd: 100 osds: 100 up, 100 in; 165 remapped pgs >> >> data: >> pools: 6 pools, 5120 pgs >> objects: 22.98 M objects, 88 TiB >> usage: 154 TiB used, 574 TiB / 727 TiB avail >> pgs: 3.027% pgs not active >> 321039/114913606 objects degraded (0.279%) >> 4903 active+clean >> 105 activating+undersized+degraded+remapped >> 61 stale+active+clean >> 47 remapped+peering >> 3 stale+activating+undersized+degraded+remapped >> 1 active+clean+scrubbing+deep >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com