Re: Nautilus 14.2.19 mon 100% CPU

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Fri, 9 Apr 2021 19:49:13 +0200

On Fri, Apr 9, 2021 at 7:24 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>
> On Fri, Apr 9, 2021 at 11:05 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> > Hi Robert,
> >
> > Have you checked a log with debug_mon=20 yet to try to see what it's doing?
> >
> I've posted the logs with debug_mon=20 for a period during high CPU
> here https://owncloud.leblancnet.us/owncloud/index.php/s/OtHsBAYN9r5eSbU
>
> You can look near the end of the log for the verbose logging. I'm not
> sure what to look for in there, nothing sticks out to me. I did
> disable cephx in the config file to see if that would help, but we
> still have the 100% CPU.
>

Thanks. I didn't see anything ultra obvious to me.

But I did notice the nearfull warnings so I wonder if this cluster is
churning through osdmaps? Did you see a large increase in inbound or
outbound network traffic on this mon following the upgrade?
Totally speculating here, but maybe there is an issue where you have
some old clients, which can't decode an incremental osdmap from a
nautilus mon, so the single mon is busy serving up these maps to the
clients.

Does the mon load decrease if you stop the osdmap churn?, e.g. by
setting norebalance if that is indeed ongoing.

Could you also share debug_ms = 1 for a minute of busy cpu mon?

-- dan

> Thank you,
> Robert LeBlanc
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx