The backfill_toofull state means that one PG which tried to backfill couldn’t do so because the *target* for backfilling didn’t have the amount of free space necessary (with a large buffer so we don’t screw up!). It doesn’t indicate anything about the overall state of the cluster, will often resolve itself as the target OSD evacuates PGs of its own, and since you did a pretty large rebalance is not very surprising. :)
On Sat, Jul 28, 2018 at 5:50 AM Sebastian Igerl <igerlster@xxxxxxxxx> wrote:
Hi,_______________________________________________i added 4 more OSDs on my 4 node Test Cluster and now i'm in HEALTH_ERR state. Right now its still recovering, but still, should this happen ? None of my OSDs are full. Maybe i need more PGs ? But since my %USE is < 40% it should be still ok to recover without HEALTH_ERR ?data:pools: 7 pools, 484 pgsobjects: 2.70 M objects, 10 TiBusage: 31 TiB used, 114 TiB / 146 TiB availpgs: 2422839/8095065 objects misplaced (29.930%)343 active+clean101 active+remapped+backfill_wait39 active+remapped+backfilling1 active+remapped+backfill_wait+backfill_toofullio:recovery: 315 MiB/s, 78 objects/sceph osd dfID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS0 hdd 2.72890 1.00000 2.7 TiB 975 GiB 1.8 TiB 34.89 1.62 311 hdd 2.72899 1.00000 2.7 TiB 643 GiB 2.1 TiB 23.00 1.07 368 hdd 7.27739 1.00000 7.3 TiB 1.7 TiB 5.5 TiB 23.85 1.11 8312 hdd 7.27730 1.00000 7.3 TiB 1.1 TiB 6.2 TiB 14.85 0.69 8116 hdd 7.27730 1.00000 7.3 TiB 2.0 TiB 5.3 TiB 27.68 1.29 7420 hdd 9.09569 1.00000 9.1 TiB 108 GiB 9.0 TiB 1.16 0.05 432 hdd 2.72899 1.00000 2.7 TiB 878 GiB 1.9 TiB 31.40 1.46 363 hdd 2.72899 1.00000 2.7 TiB 783 GiB 2.0 TiB 28.02 1.30 399 hdd 7.27739 1.00000 7.3 TiB 2.0 TiB 5.3 TiB 27.58 1.28 8513 hdd 7.27730 1.00000 7.3 TiB 2.2 TiB 5.1 TiB 30.10 1.40 7817 hdd 7.27730 1.00000 7.3 TiB 2.1 TiB 5.2 TiB 28.23 1.31 8421 hdd 9.09569 1.00000 9.1 TiB 192 GiB 8.9 TiB 2.06 0.10 414 hdd 2.72899 1.00000 2.7 TiB 927 GiB 1.8 TiB 33.18 1.54 345 hdd 2.72899 1.00000 2.7 TiB 1.0 TiB 1.7 TiB 37.57 1.75 2810 hdd 7.27739 1.00000 7.3 TiB 2.2 TiB 5.0 TiB 30.66 1.43 8714 hdd 7.27730 1.00000 7.3 TiB 1.8 TiB 5.5 TiB 24.23 1.13 8918 hdd 7.27730 1.00000 7.3 TiB 2.5 TiB 4.8 TiB 33.83 1.57 9322 hdd 9.09569 1.00000 9.1 TiB 210 GiB 8.9 TiB 2.26 0.10 446 hdd 2.72899 1.00000 2.7 TiB 350 GiB 2.4 TiB 12.51 0.58 217 hdd 2.72899 1.00000 2.7 TiB 980 GiB 1.8 TiB 35.07 1.63 3511 hdd 7.27739 1.00000 7.3 TiB 2.8 TiB 4.4 TiB 39.14 1.82 9915 hdd 7.27730 1.00000 7.3 TiB 1.6 TiB 5.6 TiB 22.49 1.05 8219 hdd 7.27730 1.00000 7.3 TiB 2.1 TiB 5.2 TiB 28.49 1.32 7723 hdd 9.09569 1.00000 9.1 TiB 285 GiB 8.8 TiB 3.06 0.14 52TOTAL 146 TiB 31 TiB 114 TiB 21.51MIN/MAX VAR: 0.05/1.82 STDDEV: 11.78Right after adding the osds it showed degraded for a few minutes, since all my pools have a redundancy of 3 and i'm adding osd i'm a bit confused why this happens ? I get why it's misplaced, but undersized and degraded ?pgs: 4611/8095032 objects degraded (0.057%)2626460/8095032 objects misplaced (32.445%)215 active+clean192 active+remapped+backfill_wait26 active+recovering+undersized+remapped17 active+recovery_wait+undersized+degraded+remapped16 active+recovering11 active+recovery_wait+degraded6 active+remapped+backfilling1 active+remapped+backfill_toofullMaybe someone can give me some pointers on what i'm missing to understand whats happening here ?Thanks!Sebastian
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com