https://tracker.ceph.com/issues/41255 is probably reporting the same issue. On Thu, Aug 22, 2019 at 6:31 PM Lars Täuber <taeuber@xxxxxxx> wrote: > > Hi there! > > We also experience this behaviour of our cluster while it is moving pgs. > > # ceph health detail > HEALTH_ERR 1 MDSs report slow metadata IOs; Reduced data availability: 2 pgs inactive; Degraded data redundancy (low space): 1 pg backfill_toofull > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs > mdsmds1(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 359 secs > PG_AVAILABILITY Reduced data availability: 2 pgs inactive > pg 21.231 is stuck inactive for 878.224182, current state remapped, last acting [20,2147483647,13,2147483647,15,10] > pg 21.240 is stuck inactive for 878.123932, current state remapped, last acting [26,17,21,20,2147483647,2147483647] > PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull > pg 21.376 is active+remapped+backfill_wait+backfill_toofull, acting [6,11,29,2,10,15] > # ceph pg map 21.376 > osdmap e68016 pg 21.376 (21.376) -> up [6,5,23,21,10,11] acting [6,11,29,2,10,15] > > # ceph osd dump | fgrep ratio > full_ratio 0.95 > backfillfull_ratio 0.9 > nearfull_ratio 0.85 > > This happens while the cluster is rebalancing the pgs after I manually mark a single osd out. > see here: > Subject: pg 21.1f9 is stuck inactive for 53316.902820, current state remapped > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036634.html > > > Mostly the cluster heals itself at least into state HEALTH_WARN: > > > # ceph health detail > HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 2 pgs inactive > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs > mdsmds1(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 1155 secs > PG_AVAILABILITY Reduced data availability: 2 pgs inactive > pg 21.231 is stuck inactive for 1677.312219, current state remapped, last acting [20,2147483647,13,2147483647,15,10] > pg 21.240 is stuck inactive for 1677.211969, current state remapped, last acting [26,17,21,20,2147483647,2147483647] > > > > Cheers, > Lars > > > Wed, 21 Aug 2019 17:28:05 -0500 > Reed Dier <reed.dier@xxxxxxxxxxx> ==> Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx> : > > Just chiming in to say that I too had some issues with backfill_toofull PGs, despite no OSD's being in a backfill_full state, albeit, there were some nearfull OSDs. > > > > I was able to get through it by reweighting down the OSD that was the target reported by ceph pg dump | grep 'backfill_toofull'. > > > > This was on 14.2.2. > > > > Reed > > > > > On Aug 21, 2019, at 2:50 PM, Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx> wrote: > > > > > > Hello > > > > > > After increasing number of PGs in a pool, ceph status is reporting "Degraded data redundancy (low space): 1 pg backfill_toofull", but I don't understand why, because all OSDs seem to have enough space. > > > > > > ceph health detail says: > > > pg 40.155 is active+remapped+backfill_toofull, acting [20,57,79,85] > > > > > > $ ceph pg map 40.155 > > > osdmap e3952 pg 40.155 (40.155) -> up [20,57,66,85] acting [20,57,79,85] > > > > > > So I guess Ceph wants to move 40.155 from 66 to 79 (or other way around?). According to "osd df", OSD 66's utilization is 71.90%, OSD 79's utilization is 58.45%. The OSD with least free space in the cluster is 81.23% full, and it's not any of the ones above. > > > > > > OSD backfillfull_ratio is 90% (is there a better way to determine this?): > > > $ ceph osd dump | grep ratio > > > full_ratio 0.95 > > > backfillfull_ratio 0.9 > > > nearfull_ratio 0.7 > > > > > > Does anybody know why a PG could be in the backfill_toofull state if no OSD is in the backfillfull state? > > > > > > > > > Vlad > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Informationstechnologie > Berlin-Brandenburgische Akademie der Wissenschaften > Jägerstraße 22-23 10117 Berlin > Tel.: +49 30 20370-352 http://www.bbaw.de > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com