Let's try to restrict discussion to the original thread "backfill_toofull while OSDs are not full" and get a tracker opened up for this issue. On Sat, Feb 2, 2019 at 11:52 AM Fyodor Ustinov <ufm@xxxxxx> wrote: > > Hi! > > Right now, after adding OSD: > > # ceph health detail > HEALTH_ERR 74197563/199392333 objects misplaced (37.212%); Degraded data redundancy (low space): 1 pg backfill_toofull > OBJECT_MISPLACED 74197563/199392333 objects misplaced (37.212%) > PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull > pg 6.eb is active+remapped+backfill_wait+backfill_toofull, acting [21,0,47] > > # ceph pg ls-by-pool iscsi backfill_toofull > PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE STATE_STAMP VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP > 6.eb 645 0 1290 0 1645654016 3067 active+remapped+backfill_wait+backfill_toofull 2019-02-02 00:20:32.975300 7208'6567 9790:16214 [5,1,21]p5 [21,0,47]p21 2019-01-18 04:13:54.280495 2019-01-18 04:13:54.280495 > > All OSD have less 40% USE. > > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 0 hdd 9.56149 1.00000 9.6 TiB 3.2 TiB 6.3 TiB 33.64 1.31 313 > 1 hdd 9.56149 1.00000 9.6 TiB 3.3 TiB 6.3 TiB 34.13 1.33 295 > 5 hdd 9.56149 1.00000 9.6 TiB 756 GiB 8.8 TiB 7.72 0.30 103 > 47 hdd 9.32390 1.00000 9.3 TiB 3.1 TiB 6.2 TiB 33.75 1.31 306 > > (all other OSD also have less 40%) > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) > > Maybe the developers will pay attention to the letter and say something? > > ----- Original Message ----- > From: "Fyodor Ustinov" <ufm@xxxxxx> > To: "Caspar Smit" <casparsmit@xxxxxxxxxxx> > Cc: "Jan Kasprzak" <kas@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Sent: Thursday, 31 January, 2019 16:50:24 > Subject: Re: backfill_toofull after adding new OSDs > > Hi! > > I saw the same several times when I added a new osd to the cluster. One-two pg in "backfill_toofull" state. > > In all versions of mimic. > > ----- Original Message ----- > From: "Caspar Smit" <casparsmit@xxxxxxxxxxx> > To: "Jan Kasprzak" <kas@xxxxxxxxxx> > Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Sent: Thursday, 31 January, 2019 15:43:07 > Subject: Re: backfill_toofull after adding new OSDs > > Hi Jan, > > You might be hitting the same issue as Wido here: > > [ https://www.spinics.net/lists/ceph-users/msg50603.html | https://www.spinics.net/lists/ceph-users/msg50603.html ] > > Kind regards, > Caspar > > Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:kas@xxxxxxxxxx | kas@xxxxxxxxxx ] >: > > > Hello, ceph users, > > I see the following HEALTH_ERR during cluster rebalance: > > Degraded data redundancy (low space): 8 pgs backfill_toofull > > Detailed description: > I have upgraded my cluster to mimic and added 16 new bluestore OSDs > on 4 hosts. The hosts are in a separate region in my crush map, and crush > rules prevented data to be moved on the new OSDs. Now I want to move > all data to the new OSDs (and possibly decomission the old filestore OSDs). > I have created the following rule: > > # ceph osd crush rule create-replicated on-newhosts newhostsroot host > > after this, I am slowly moving the pools one-by-one to this new rule: > > # ceph osd pool set test-hdd-pool crush_rule on-newhosts > > When I do this, I get the above error. This is misleading, because > ceph osd df does not suggest the OSDs are getting full (the most full > OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR > disappears. Why am I getting this error? > > # ceph -s > cluster: > id: ...my UUID... > health: HEALTH_ERR > 1271/3803223 objects misplaced (0.033%) > Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs degraded, 67 pgs undersized > Degraded data redundancy (low space): 8 pgs backfill_toofull > > services: > mon: 3 daemons, quorum mon1,mon2,mon3 > mgr: mon2(active), standbys: mon1, mon3 > osd: 80 osds: 80 up, 80 in; 90 remapped pgs > rgw: 1 daemon active > > data: > pools: 13 pools, 5056 pgs > objects: 1.27 M objects, 4.8 TiB > usage: 15 TiB used, 208 TiB / 224 TiB avail > pgs: 40124/3803223 objects degraded (1.055%) > 1271/3803223 objects misplaced (0.033%) > 4963 active+clean > 41 active+recovery_wait+undersized+degraded+remapped > 21 active+recovery_wait+undersized+degraded > 17 active+remapped+backfill_wait > 5 active+remapped+backfill_wait+backfill_toofull > 3 active+remapped+backfill_toofull > 2 active+recovering+undersized+remapped > 2 active+recovering+undersized+degraded+remapped > 1 active+clean+remapped > 1 active+recovering+undersized+degraded > > io: > client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr > recovery: 2.0 MiB/s, 92 objects/s > > Thanks for any hint, > > -Yenya > > -- > | Jan "Yenya" Kasprzak <kas at { [ http://fi.muni.cz/ | fi.muni.cz ] - work | [ http://yenya.net/ | yenya.net ] - private}> | > | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 4096R/A45477D5 | > This is the world we live in: the way to deal with computers is to google > the symptoms, and hope that you don't have to watch a video. --P. Zaitcev > _______________________________________________ > ceph-users mailing list > [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] > [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com