Hello, I have a Proxmox ceph cluster with 5 nodes and 3 OSDs each (total 15 OSDs), on a 10G network. The cluster started small, and I’ve progressively added OSDs over time. Problem is…. The cluster never rebalances completely. There is always progress on backfilling, but PGs that used to be in active+clean state jump back into the active+remapped+backfilling (or active+remapped+backfill_wait) state, to be moved to different OSDs. Initially I had a 1G network (recently upgraded to 10G), and I was holding on the backfill settings (osd_max_backfills and osd_recovery_sleep_hdd). I just recently (last few weeks) upgraded to 10G, with osd_max_backfills = 50 and osd_recovery_sleep_hdd = 0 (only HDDs, no SSDs). Cluster has been backfilling for months now with no end in sight. Is this normal behavior? Is there any setting that I can look at that till give me an idea as to why PGs are jumping back into remapped from clean? Below is output of “ceph osd tree” and “ceph osd df”: # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 203.72472 root default -9 40.01666 host vis-hsw-01 3 hdd 10.91309 osd.3 up 1.00000 1.00000 6 hdd 14.55179 osd.6 up 1.00000 1.00000 10 hdd 14.55179 osd.10 up 1.00000 1.00000 -13 40.01666 host vis-hsw-02 0 hdd 10.91309 osd.0 up 1.00000 1.00000 7 hdd 14.55179 osd.7 up 1.00000 1.00000 11 hdd 14.55179 osd.11 up 1.00000 1.00000 -11 40.01666 host vis-hsw-03 4 hdd 10.91309 osd.4 up 1.00000 1.00000 8 hdd 14.55179 osd.8 up 1.00000 1.00000 12 hdd 14.55179 osd.12 up 1.00000 1.00000 -3 40.01666 host vis-hsw-04 5 hdd 10.91309 osd.5 up 1.00000 1.00000 9 hdd 14.55179 osd.9 up 1.00000 1.00000 13 hdd 14.55179 osd.13 up 1.00000 1.00000 -15 43.65807 host vis-hsw-05 1 hdd 14.55269 osd.1 up 1.00000 1.00000 2 hdd 14.55269 osd.2 up 1.00000 1.00000 14 hdd 14.55269 osd.14 up 1.00000 1.00000 -5 0 host vis-ivb-07 -7 0 host vis-ivb-10 # # ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 3 hdd 10.91309 1.00000 11 TiB 8.2 TiB 8.2 TiB 552 MiB 25 GiB 2.7 TiB 75.08 1.19 131 up 6 hdd 14.55179 1.00000 15 TiB 9.1 TiB 9.1 TiB 1.2 GiB 30 GiB 5.5 TiB 62.47 0.99 148 up 10 hdd 14.55179 1.00000 15 TiB 8.1 TiB 8.1 TiB 1.5 GiB 20 GiB 6.4 TiB 55.98 0.89 142 up 0 hdd 10.91309 1.00000 11 TiB 7.5 TiB 7.4 TiB 504 MiB 24 GiB 3.5 TiB 68.34 1.09 120 up 7 hdd 14.55179 1.00000 15 TiB 8.7 TiB 8.7 TiB 1.0 GiB 31 GiB 5.8 TiB 60.07 0.95 144 up 11 hdd 14.55179 1.00000 15 TiB 9.4 TiB 9.3 TiB 819 MiB 20 GiB 5.2 TiB 64.31 1.02 147 up 4 hdd 10.91309 1.00000 11 TiB 7.0 TiB 7.0 TiB 284 MiB 25 GiB 3.9 TiB 64.35 1.02 112 up 8 hdd 14.55179 1.00000 15 TiB 9.3 TiB 9.2 TiB 1.8 GiB 29 GiB 5.3 TiB 63.65 1.01 157 up 12 hdd 14.55179 1.00000 15 TiB 8.6 TiB 8.6 TiB 623 MiB 19 GiB 5.9 TiB 59.14 0.94 136 up 5 hdd 10.91309 1.00000 11 TiB 8.6 TiB 8.6 TiB 542 MiB 29 GiB 2.3 TiB 79.01 1.26 134 up 9 hdd 14.55179 1.00000 15 TiB 8.2 TiB 8.2 TiB 707 MiB 27 GiB 6.3 TiB 56.56 0.90 138 up 13 hdd 14.55179 1.00000 15 TiB 8.7 TiB 8.7 TiB 741 MiB 18 GiB 5.8 TiB 59.85 0.95 134 up 1 hdd 14.55269 1.00000 15 TiB 9.8 TiB 9.8 TiB 1.3 GiB 20 GiB 4.8 TiB 67.18 1.07 158 up 2 hdd 14.55269 1.00000 15 TiB 8.7 TiB 8.7 TiB 936 MiB 18 GiB 5.8 TiB 60.04 0.95 148 up 14 hdd 14.55269 1.00000 15 TiB 8.3 TiB 8.3 TiB 673 MiB 18 GiB 6.3 TiB 56.97 0.90 131 up TOTAL 204 TiB 128 TiB 128 TiB 13 GiB 350 GiB 75 TiB 62.95 MIN/MAX VAR: 0.89/1.26 STDDEV: 6.44 # Thank you! George _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx