Re: iostat and dashboard freezing

Reed Dier <reed.dier@xxxxxxxxxxx> · Thu, 29 Aug 2019 09:56:17 -0500

See responses below.

On Aug 28, 2019, at 11:13 PM, Konstantin Shalygin <k0ste@xxxxxxxx> wrote:

        Just a follow up 24h later, and the mgr's seem to be far more stable, and have had no issues or weirdness after disabling the balancer module.

Which isn't great, because the balancer plays an important role, but after fighting distribution for a few weeks and getting it 'good enough' I'm taking the stability.

Just wanted to follow up with another 2¢.

      What is your balancer settings (`ceph config-key ls`)? Your mgr
      running in virtual environment or on bare metal?

bare metal
$ ceph config-key ls | grep balance
    "config/mgr/mgr/balancer/active",
    "config/mgr/mgr/balancer/max_misplaced",
    "config/mgr/mgr/balancer/mode",
    "config/mgr/mgr/balancer/pool_ids",
    "mgr/balancer/active",
    "mgr/balancer/max_misplaced",
    "mgr/balancer/mode",

How much pools you have? Please also paste `ceph osd tree` &
      `ceph osd df tree`. 

$ ceph osd pool ls detail
pool 16 replicated crush_rule 1 object_hash rjenkins pg_num 4    autoscale_mode warn last_change 157895 lfor 0/157895/157893 flags hashpspool,nodelete stripe_width 0 application cephfs
pool 17 replicated crush_rule 0 object_hash rjenkins pg_num 1024 autoscale_mode warn last_change 174817 flags hashpspool,nodelete stripe_width 0 compression_algorithm snappy compression_mode aggressive application cephfs
pool 20 replicated crush_rule 2 object_hash rjenkins pg_num 4096 autoscale_mode warn last_change 174817 flags hashpspool,nodelete stripe_width 0 application freeform
pool 24 replicated crush_rule 0 object_hash rjenkins pg_num 16   autoscale_mode warn last_change 174817 lfor 0/157704/157702 flags hashpspool stripe_width 0 compression_algorithm snappy compression_mode none application freeform
pool 29 replicated crush_rule 2 object_hash rjenkins pg_num 128  autoscale_mode warn last_change 174817 lfor 0/0/142604 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 30 replicated crush_rule 0 object_hash rjenkins pg_num 1    autoscale_mode warn last_change 174817 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 31 replicated crush_rule 2 object_hash rjenkins pg_num 16   autoscale_mode warn last_change 174926 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
https://pastebin.com/bXPs28h1
Measure time of balancer plan creation: `time ceph balancer
      optimize new`.
I hadn't seen this optimize command yet, I was always doing balancer eval $plan, balancer execute $plan.
$ time ceph balancer optimize newplan1
Error EALREADY: Unable to find further optimization, or pool(s)' pg_num is decreasing, or distribution is already perfect

real    3m10.627s
user    0m0.352s
sys     0m0.055s

Reed

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com