Hi! Right now, after adding OSD: # ceph health detail HEALTH_ERR 74197563/199392333 objects misplaced (37.212%); Degraded data redundancy (low space): 1 pg backfill_toofull OBJECT_MISPLACED 74197563/199392333 objects misplaced (37.212%) PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull pg 6.eb is active+remapped+backfill_wait+backfill_toofull, acting [21,0,47] # ceph pg ls-by-pool iscsi backfill_toofull PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE STATE_STAMP VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 6.eb 645 0 1290 0 1645654016 3067 active+remapped+backfill_wait+backfill_toofull 2019-02-02 00:20:32.975300 7208'6567 9790:16214 [5,1,21]p5 [21,0,47]p21 2019-01-18 04:13:54.280495 2019-01-18 04:13:54.280495 All OSD have less 40% USE. ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 9.56149 1.00000 9.6 TiB 3.2 TiB 6.3 TiB 33.64 1.31 313 1 hdd 9.56149 1.00000 9.6 TiB 3.3 TiB 6.3 TiB 34.13 1.33 295 5 hdd 9.56149 1.00000 9.6 TiB 756 GiB 8.8 TiB 7.72 0.30 103 47 hdd 9.32390 1.00000 9.3 TiB 3.1 TiB 6.2 TiB 33.75 1.31 306 (all other OSD also have less 40%) ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) Maybe the developers will pay attention to the letter and say something? ----- Original Message ----- From: "Fyodor Ustinov" <ufm@xxxxxx> To: "Caspar Smit" <casparsmit@xxxxxxxxxxx> Cc: "Jan Kasprzak" <kas@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx> Sent: Thursday, 31 January, 2019 16:50:24 Subject: Re: backfill_toofull after adding new OSDs Hi! I saw the same several times when I added a new osd to the cluster. One-two pg in "backfill_toofull" state. In all versions of mimic. ----- Original Message ----- From: "Caspar Smit" <casparsmit@xxxxxxxxxxx> To: "Jan Kasprzak" <kas@xxxxxxxxxx> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> Sent: Thursday, 31 January, 2019 15:43:07 Subject: Re: backfill_toofull after adding new OSDs Hi Jan, You might be hitting the same issue as Wido here: [ https://www.spinics.net/lists/ceph-users/msg50603.html | https://www.spinics.net/lists/ceph-users/msg50603.html ] Kind regards, Caspar Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:kas@xxxxxxxxxx | kas@xxxxxxxxxx ] >: Hello, ceph users, I see the following HEALTH_ERR during cluster rebalance: Degraded data redundancy (low space): 8 pgs backfill_toofull Detailed description: I have upgraded my cluster to mimic and added 16 new bluestore OSDs on 4 hosts. The hosts are in a separate region in my crush map, and crush rules prevented data to be moved on the new OSDs. Now I want to move all data to the new OSDs (and possibly decomission the old filestore OSDs). I have created the following rule: # ceph osd crush rule create-replicated on-newhosts newhostsroot host after this, I am slowly moving the pools one-by-one to this new rule: # ceph osd pool set test-hdd-pool crush_rule on-newhosts When I do this, I get the above error. This is misleading, because ceph osd df does not suggest the OSDs are getting full (the most full OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR disappears. Why am I getting this error? # ceph -s cluster: id: ...my UUID... health: HEALTH_ERR 1271/3803223 objects misplaced (0.033%) Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs degraded, 67 pgs undersized Degraded data redundancy (low space): 8 pgs backfill_toofull services: mon: 3 daemons, quorum mon1,mon2,mon3 mgr: mon2(active), standbys: mon1, mon3 osd: 80 osds: 80 up, 80 in; 90 remapped pgs rgw: 1 daemon active data: pools: 13 pools, 5056 pgs objects: 1.27 M objects, 4.8 TiB usage: 15 TiB used, 208 TiB / 224 TiB avail pgs: 40124/3803223 objects degraded (1.055%) 1271/3803223 objects misplaced (0.033%) 4963 active+clean 41 active+recovery_wait+undersized+degraded+remapped 21 active+recovery_wait+undersized+degraded 17 active+remapped+backfill_wait 5 active+remapped+backfill_wait+backfill_toofull 3 active+remapped+backfill_toofull 2 active+recovering+undersized+remapped 2 active+recovering+undersized+degraded+remapped 1 active+clean+remapped 1 active+recovering+undersized+degraded io: client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr recovery: 2.0 MiB/s, 92 objects/s Thanks for any hint, -Yenya -- | Jan "Yenya" Kasprzak <kas at { [ http://fi.muni.cz/ | fi.muni.cz ] - work | [ http://yenya.net/ | yenya.net ] - private}> | | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 4096R/A45477D5 | This is the world we live in: the way to deal with computers is to google the symptoms, and hope that you don't have to watch a video. --P. Zaitcev _______________________________________________ ceph-users mailing list [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com