So, It looks like my problem got resolved by itself, of course right after I sent the email to this group that there was a problem. However, I did notice the following, which coincide with your observations: I have the pg autoscaler on, but currently there isn’t too much (write) activity in the cluster. The # of PGs are more or less constant at 256. However what I noticed after looking at the mgr and mon logs is that overnight (almost every night), the number of PGs would decrease and slowly creep up to reach 256 again. The next morning, when I check the # of PGs, they “look” steady, but in fact there has been havoc overnight which caused a lot of PGs re-shuffling. My number of objects misplaced also hovered around 5%. My stddev output of “ceph osd df” has been gradually decreasing from 10% to 6% or so (after installing new drives). Over the weekend, the stddev figure dropped to about 5%, and the re-shuffling activity stopped. Now my stddev is 5.73. Of course reshuffling activity stopped as soon as I turned on additional debugging (mgr_debug_aggressive_pg_num_changes) on the mgr. (Murphy’s law?) A this point the only thing that makes sense to me is that the PG autoscaler has some logic to force reshuffling by playing around with the # of PGs if the stddev allocation of PGS is too high. Can anybody confirm? I also do have an EC pool, so it may be related. Thanks! George On Apr 25, 2020, at 6:53 AM, Paul Mezzanini <pfmeec@xxxxxxx<mailto:pfmeec@xxxxxxx>> wrote: This sounds very familiar. I recently bumped the of number of PGs on one of our pools from 1k to 8k and it's been rebalancing for weeks. From the standard tools it looks like all the PGs are allocated and running but it keeps floating at 5% misplaced. I finally found where it reports where it is status wise. [pfmeec@scruffy ~]$ ceph osd pool get cold-ec all ~snip~ pg_num: 8192 pgp_num: 1982 ~snip~ Whatever is increasing the PGs in the background won't let it go over 5% misplaced so it's slowly creeping up the pgp. Ceph status and our dashboard just show constant rebalance and a slowly oscillating misplaced object counter. Check your pools using what I used above and see if you have something similar going on. It hasn't bugged me enough to try and find that 5% knob but perhaps I should. It would also be really nice to have an overall status that I don't need to dig for.. -paul -- Paul Mezzanini Sr Systems Administrator / Engineer, Research Computing Information & Technology Services Finance & Administration Rochester Institute of Technology o:(585) 475-3245 | pfmeec@xxxxxxx<mailto:pfmeec@xxxxxxx> Sent from my phone. Please excuse any brevity or typoos. CONFIDENTIALITY NOTE: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. ------------------------ ________________________________ From: Kyriazis, George <george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx>> Sent: Friday, April 24, 2020 3:05:44 PM To: Eugen Block <eblock@xxxxxx<mailto:eblock@xxxxxx>> Cc: ceph-users <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>> Subject: Re: active+remapped+backfilling keeps going .. and going pg_autoscalar is on, but the number of PGs are stable. I’ve seen subsequent calls to “ceph -s” list the same number of total PGs, but PGs remapped+backfilling increased. I haven’t seen anything in the logs, but perhaps I’m not looking at the right place. Any place in particular I should be looking? Thanks! George # ceph -s cluster: id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452 health: HEALTH_WARN 603 pgs not deep-scrubbed in time 603 pgs not scrubbed in time 2 daemons have recently crashed services: mon: 5 daemons, quorum vis-ivb-07,vis-ivb-10,vis-hsw-01,vis-clx-01,vis-clx-05 (age 2h) mgr: vis-ivb-07(active, since 2h), standbys: vis-hsw-01, vis-ivb-10, vis-clx-01, vis-clx-05 mds: cephfs:1 {0=vis-hsw-01=up:active} 2 up:standby osd: 15 osds: 15 up (since 2d), 15 in (since 8d); 100 remapped pgs data: pools: 5 pools, 608 pgs objects: 46.32M objects, 49 TiB usage: 129 TiB used, 75 TiB / 204 TiB avail pgs: 8985854/172482064 objects misplaced (5.210%) 508 active+clean 100 active+remapped+backfilling io: client: 102 KiB/s wr, 0 op/s rd, 4 op/s wr recovery: 117 MiB/s, 86 objects/s # ceph -s cluster: id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452 health: HEALTH_WARN 603 pgs not deep-scrubbed in time 603 pgs not scrubbed in time 2 daemons have recently crashed services: mon: 5 daemons, quorum vis-ivb-07,vis-ivb-10,vis-hsw-01,vis-clx-01,vis-clx-05 (age 5h) mgr: vis-ivb-07(active, since 5h), standbys: vis-hsw-01, vis-ivb-10, vis-clx-01, vis-clx-05 mds: cephfs:1 {0=vis-hsw-01=up:active} 2 up:standby osd: 15 osds: 15 up (since 2d), 15 in (since 8d); 103 remapped pgs data: pools: 5 pools, 608 pgs objects: 46.32M objects, 49 TiB usage: 128 TiB used, 75 TiB / 204 TiB avail pgs: 8681394/172482064 objects misplaced (5.033%) 505 active+clean 103 active+remapped+backfilling io: recovery: 70 MiB/s, 54 objects/s # On Apr 24, 2020, at 1:52 PM, Eugen Block <eblock@xxxxxx<mailto:eblock@xxxxxx><mailto:eblock@xxxxxx>> wrote: Yes, that means it's off. Can you see anything in the logs? They should show that something triggers the rebalancing. Could it be the pg_autoscaler? Is that enabled? Zitat von "Kyriazis, George" <george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx><mailto:george.kyriazis@xxxxxxxxx>>: Here is the status of my balancer: # ceph balancer status { "last_optimize_duration": "", "plans": [], "mode": "none", "active": false, "optimize_result": "", "last_optimize_started": "" } # Doesn’t that mean it’s “off”? Thanks, George On Apr 24, 2020, at 1:49 AM, Lomayani S. Laizer <lomlaizer@xxxxxxxxx<mailto:lomlaizer@xxxxxxxxx><mailto:lomlaizer@xxxxxxxxx><mailto:lomlaizer@xxxxxxxxx>> wrote: I had a similar problem when upgraded to octopus and the solution is to turn off autobalancing. You can try to turn off if enabled ceph balancer off On Fri, Apr 24, 2020 at 8:51 AM Eugen Block <eblock@xxxxxx<mailto:eblock@xxxxxx><mailto:eblock@xxxxxx><mailto:eblock@xxxxxx>> wrote: Hi, the balancer is probably running, which mode? I changed the mode to none in our own cluster because it also never finished rebalancing and we didn’t have a bad pg distribution. Maybe it’s supposed to be like that, I don’t know. Regards Eugen Zitat von "Kyriazis, George" <george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx><mailto:george.kyriazis@xxxxxxxxx><mailto:george.kyriazis@xxxxxxxxx>>: Hello, I have a Proxmox ceph cluster with 5 nodes and 3 OSDs each (total 15 OSDs), on a 10G network. The cluster started small, and I’ve progressively added OSDs over time. Problem is…. The cluster never rebalances completely. There is always progress on backfilling, but PGs that used to be in active+clean state jump back into the active+remapped+backfilling (or active+remapped+backfill_wait) state, to be moved to different OSDs. Initially I had a 1G network (recently upgraded to 10G), and I was holding on the backfill settings (osd_max_backfills and osd_recovery_sleep_hdd). I just recently (last few weeks) upgraded to 10G, with osd_max_backfills = 50 and osd_recovery_sleep_hdd = 0 (only HDDs, no SSDs). Cluster has been backfilling for months now with no end in sight. Is this normal behavior? Is there any setting that I can look at that till give me an idea as to why PGs are jumping back into remapped from clean? Below is output of “ceph osd tree” and “ceph osd df”: # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 203.72472 root default -9 40.01666 host vis-hsw-01 3 hdd 10.91309 osd.3 up 1.00000 1.00000 6 hdd 14.55179 osd.6 up 1.00000 1.00000 10 hdd 14.55179 osd.10 up 1.00000 1.00000 -13 40.01666 host vis-hsw-02 0 hdd 10.91309 osd.0 up 1.00000 1.00000 7 hdd 14.55179 osd.7 up 1.00000 1.00000 11 hdd 14.55179 osd.11 up 1.00000 1.00000 -11 40.01666 host vis-hsw-03 4 hdd 10.91309 osd.4 up 1.00000 1.00000 8 hdd 14.55179 osd.8 up 1.00000 1.00000 12 hdd 14.55179 osd.12 up 1.00000 1.00000 -3 40.01666 host vis-hsw-04 5 hdd 10.91309 osd.5 up 1.00000 1.00000 9 hdd 14.55179 osd.9 up 1.00000 1.00000 13 hdd 14.55179 osd.13 up 1.00000 1.00000 -15 43.65807 host vis-hsw-05 1 hdd 14.55269 osd.1 up 1.00000 1.00000 2 hdd 14.55269 osd.2 up 1.00000 1.00000 14 hdd 14.55269 osd.14 up 1.00000 1.00000 -5 0 host vis-ivb-07 -7 0 host vis-ivb-10 # # ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 3 hdd 10.91309 1.00000 11 TiB 8.2 TiB 8.2 TiB 552 MiB 25 GiB 2.7 TiB 75.08 1.19 131 up 6 hdd 14.55179 1.00000 15 TiB 9.1 TiB 9.1 TiB 1.2 GiB 30 GiB 5.5 TiB 62.47 0.99 148 up 10 hdd 14.55179 1.00000 15 TiB 8.1 TiB 8.1 TiB 1.5 GiB 20 GiB 6.4 TiB 55.98 0.89 142 up 0 hdd 10.91309 1.00000 11 TiB 7.5 TiB 7.4 TiB 504 MiB 24 GiB 3.5 TiB 68.34 1.09 120 up 7 hdd 14.55179 1.00000 15 TiB 8.7 TiB 8.7 TiB 1.0 GiB 31 GiB 5.8 TiB 60.07 0.95 144 up 11 hdd 14.55179 1.00000 15 TiB 9.4 TiB 9.3 TiB 819 MiB 20 GiB 5.2 TiB 64.31 1.02 147 up 4 hdd 10.91309 1.00000 11 TiB 7.0 TiB 7.0 TiB 284 MiB 25 GiB 3.9 TiB 64.35 1.02 112 up 8 hdd 14.55179 1.00000 15 TiB 9.3 TiB 9.2 TiB 1.8 GiB 29 GiB 5.3 TiB 63.65 1.01 157 up 12 hdd 14.55179 1.00000 15 TiB 8.6 TiB 8.6 TiB 623 MiB 19 GiB 5.9 TiB 59.14 0.94 136 up 5 hdd 10.91309 1.00000 11 TiB 8.6 TiB 8.6 TiB 542 MiB 29 GiB 2.3 TiB 79.01 1.26 134 up 9 hdd 14.55179 1.00000 15 TiB 8.2 TiB 8.2 TiB 707 MiB 27 GiB 6.3 TiB 56.56 0.90 138 up 13 hdd 14.55179 1.00000 15 TiB 8.7 TiB 8.7 TiB 741 MiB 18 GiB 5.8 TiB 59.85 0.95 134 up 1 hdd 14.55269 1.00000 15 TiB 9.8 TiB 9.8 TiB 1.3 GiB 20 GiB 4.8 TiB 67.18 1.07 158 up 2 hdd 14.55269 1.00000 15 TiB 8.7 TiB 8.7 TiB 936 MiB 18 GiB 5.8 TiB 60.04 0.95 148 up 14 hdd 14.55269 1.00000 15 TiB 8.3 TiB 8.3 TiB 673 MiB 18 GiB 6.3 TiB 56.97 0.90 131 up TOTAL 204 TiB 128 TiB 128 TiB 13 GiB 350 GiB 75 TiB 62.95 MIN/MAX VAR: 0.89/1.26 STDDEV: 6.44 # Thank you! George _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx