Re: iostat and dashboard freezing

Reed Dier <reed.dier@xxxxxxxxxxx> · Tue, 27 Aug 2019 11:53:41 -0500

Just to further piggyback,
Probably the most "hard" the mgr seems to get pushed is when the balancer is engaged.
When trying to eval a pool or cluster, it takes upwards of 30-120 seconds for it to score it, and then another 30-120 seconds to execute the plan, and it never seems to engage automatically.

$ time ceph balancer status
{
    "active": true,
    "plans": [],
    "mode": "upmap"
}

real    0m36.490s
user    0m0.259s
sys     0m0.044s

I'm going to disable mine as well, and see if I can stop waking up to 'No Active MGR.'

You can see when I lose mgr's because RBD image stats go to 0 until I catch it.

Thanks,

Reed

On Aug 27, 2019, at 11:24 AM, Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote:

Hi Reed, Lenz, John

I've just tried disabling the balancer, so far ceph-mgr is keeping it's
CPU mostly under 20%, even with both the iostat and dashboard back on.

# ceph balancer off

was
[root@ceph-s1 backup]# ceph balancer status
{
    "active": true,
    "plans": [],
    "mode": "upmap"
}

now
[root@ceph-s1 backup]# ceph balancer status
{
    "active": false,
    "plans": [],
    "mode": "upmap"
}

We are using 8:2 erasure encoding across 324 12TB OSD, plus 4 NVMe OSD
for a replicated cephfs metadata pool.

let me know if the balancer is your problem too...

best,

Jake

On 8/27/19 3:57 PM, Jake Grimmett wrote:
Yes, the problem still occurs with the dashboard disabled...

Possibly relevant, when both the dashboard and iostat plugins are
disabled, I occasionally see ceph-mgr rise to 100% CPU.

as suggested by John Hearns, the output of  gstack ceph-mgr when at 100%
is here:

http://p.ip.fi/52sV

many thanks

Jake

On 8/27/19 3:09 PM, Reed Dier wrote:
I'm currently seeing this with the dashboard disabled.

My instability decreases, but isn't wholly cured, by disabling
prometheus and rbd_support, which I use in tandem, as the only thing I'm
using the prom-exporter for is the per-rbd metrics.

ceph mgr module ls
{
    "enabled_modules": [
        "diskprediction_local",
        "influx",
        "iostat",
        "prometheus",
        "rbd_support",
        "restful",
        "telemetry"
    ],

I'm on Ubuntu 18.04, so that doesn't corroborate with some possible OS
correlation.

Thanks,

Reed

On Aug 27, 2019, at 8:37 AM, Lenz Grimmer <lgrimmer@xxxxxxxx
<mailto:lgrimmer@xxxxxxxx>> wrote:

Hi Jake,

On 8/27/19 3:22 PM, Jake Grimmett wrote:

That exactly matches what I'm seeing:

when iostat is working OK, I see ~5% CPU use by ceph-mgr
and when iostat freezes, ceph-mgr CPU increases to 100%

Does this also occur if the dashboard module is disabled? Just wondering
if this is isolatable to the iostat module. Thanks!

Lenz

-- 
SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg
GF: Felix Imendörffer, HRB 247165 (AG Nürnberg)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com