See responses below.
On Aug 28, 2019, at 11:13 PM, Konstantin Shalygin < k0ste@xxxxxxxx> wrote:
Just a follow up 24h later, and the mgr's seem to be far more stable, and have had no issues or weirdness after disabling the balancer module.
Which isn't great, because the balancer plays an important role, but after fighting distribution for a few weeks and getting it 'good enough' I'm taking the stability.
Just wanted to follow up with another 2¢.
What is your balancer settings (`ceph config-key ls`)? Your mgr
running in virtual environment or on bare metal?
bare metal $ ceph config-key ls | grep balance "config/mgr/mgr/balancer/active", "config/mgr/mgr/balancer/max_misplaced", "config/mgr/mgr/balancer/mode", "config/mgr/mgr/balancer/pool_ids", "mgr/balancer/active", "mgr/balancer/max_misplaced", "mgr/balancer/mode",
How much pools you have? Please also paste `ceph osd tree` &
`ceph osd df tree`.
$ ceph osd pool ls detailpool 16 replicated crush_rule 1 object_hash rjenkins pg_num 4 autoscale_mode warn last_change 157895 lfor 0/157895/157893 flags hashpspool,nodelete stripe_width 0 application cephfs
pool 17 replicated crush_rule 0 object_hash rjenkins pg_num 1024 autoscale_mode warn last_change 174817 flags hashpspool,nodelete stripe_width 0 compression_algorithm snappy compression_mode aggressive application cephfs pool 20 replicated crush_rule 2 object_hash rjenkins pg_num 4096 autoscale_mode warn last_change 174817 flags hashpspool,nodelete stripe_width 0 application freeform pool 24 replicated crush_rule 0 object_hash rjenkins pg_num 16 autoscale_mode warn last_change 174817 lfor 0/157704/157702 flags hashpspool stripe_width 0 compression_algorithm snappy compression_mode none application freeform pool 29 replicated crush_rule 2 object_hash rjenkins pg_num 128 autoscale_mode warn last_change 174817 lfor 0/0/142604 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 30 replicated crush_rule 0 object_hash rjenkins pg_num 1 autoscale_mode warn last_change 174817 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth pool 31 replicated crush_rule 2 object_hash rjenkins pg_num 16 autoscale_mode warn last_change 174926 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
Measure time of balancer plan creation: `time ceph balancer
optimize new`.
I hadn't seen this optimize command yet, I was always doing balancer eval $plan, balancer execute $plan. $ time ceph balancer optimize newplan1 Error EALREADY: Unable to find further optimization, or pool(s)' pg_num is decreasing, or distribution is already perfect
real 3m10.627s user 0m0.352s sys 0m0.055s
Reed
|
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com