Re: iostat and dashboard freezing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/2/19 5:47 PM, Jake Grimmett wrote:
Hi Konstantin,

To confirm, disabling the balancer allows the mgr to work properly.

I tried re-enabling the balancer, it briefly worked, then locked up the
mgr again.

Here it's working OK...
[root@ceph-s1 ~]# time ceph balancer optimize new

real	0m1.628s
user	0m0.583s
sys	0m0.075s

[root@ceph-s1 ~]# ceph balancer status
{
     "active": false,
     "plans": [
         "new"
     ],
     "mode": "upmap"
}

[root@ceph-s1 ~]# ceph balancer on

At this point, the balancer seems initially to be working as 'ceph -s'
shows the misplaced count going from 0 to ...
     pgs:     6829497/4977639365 objects misplaced (0.137%)

However mgr now goes back up to 100% CPU, and stopping balancer is very
difficult

[root@ceph-s1 ~]# ceph balancer off
real	5m37.641s
user	0m0.751s
sys	0m0.158s

[root@ceph-s1 ~]# time ceph balancer optimize new

real	18m19.202s
user	0m1.388s
sys	0m0.413s


Here is the other data you requested:
[root@ceph-s1 ~]# ceph config-key ls | grep balance
     "config-history/10/+mgr/mgr/balancer/active",
     "config-history/29/+mgr/mgr/balancer/active",
     "config-history/29/-mgr/mgr/balancer/active",
     "config-history/30/+mgr/mgr/balancer/active",
     "config-history/30/-mgr/mgr/balancer/active",
     "config-history/31/+mgr/mgr/balancer/active",
     "config-history/31/-mgr/mgr/balancer/active",
     "config-history/32/+mgr/mgr/balancer/active",
     "config-history/32/-mgr/mgr/balancer/active",
     "config-history/33/+mgr/mgr/balancer/active",
     "config-history/33/-mgr/mgr/balancer/active",
     "config-history/9/+mgr/mgr/balancer/mode",
     "config/mgr/mgr/balancer/active",
     "config/mgr/mgr/balancer/mode",

We have two main pools:
pool #1 is 3x replicated, has 4 NVMe OSD and is only used for cephfs
metadata. This is on 4 nodes (that also run the mgr, mon and mds)

Pool #2 is erasure encoded 8+2, has 324 x 12TB OSD over 36 nodes, and is
the data partition for cephfs. All osd in pool 2 have a db/wal on nvme
(6 hdd per NVMe)

'ceph df detail' is here:
<http://p.ip.fi/4l4m>

'ceph osd tree' is here:
http://p.ip.fi/k1x2

'ceph osd df tree' output is  here:
http://p.ip.fi/g7ma

any help appreciated,


Jake, you already have good VAR for your OSD's.

I suggest to set `mgr/balancer/default_sleep_interval` to '2', and decrease `mgr/balancer/default_sleep_interval` to `300`.




k

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux