Re: Nautilus 14.2.19 mon 100% CPU

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Fri, 9 Apr 2021 12:39:36 -0600

On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> Thanks. I didn't see anything ultra obvious to me.
>
> But I did notice the nearfull warnings so I wonder if this cluster is
> churning through osdmaps? Did you see a large increase in inbound or
> outbound network traffic on this mon following the upgrade?
> Totally speculating here, but maybe there is an issue where you have
> some old clients, which can't decode an incremental osdmap from a
> nautilus mon, so the single mon is busy serving up these maps to the
> clients.
>
> Does the mon load decrease if you stop the osdmap churn?, e.g. by
> setting norebalance if that is indeed ongoing.
>
> Could you also share debug_ms = 1 for a minute of busy cpu mon?

Here are the new logs with the debug_ms=1 for a bit.
https://owncloud.leblancnet.us/owncloud/index.php/s/1hvtJo3s2oLPpWn
We do have nearfill, but there are no backfills going on (we don't
have any auto balancing, we only use a tool that we wrote and only do
it periodically). All PGs are currently active+clean. It appears that
some osdmaps are still being trimmed on OSDs, but I'm not sure how to
validate that. Our system is always under heavy load so there is
always a lot of new data and deletions.

We did have our monitor cluster freeze up after the upgrade. It was
completely locked up when `ceph -s` command would run. It would not
recover on it's own. We reduced the mon cluster to a single node then
and we had to take the monitor node off the network. Then we started
and stopped the monitor a few times and could run `ceph -s` commands
without the 100% CPU. We then put it back on the network and it was
fine. We added the remaining monitor nodes after deleting the data
directory (fresh install) and it was fine until the last batch of OSDs
were converted to BlueStore. The CPU for the entire upgrade is here
https://owncloud.leblancnet.us/owncloud/index.php/s/Jku5z575PdE3AOr
and the upgrade started just before 3/25. Some of that is due to the
mgr.

We do have some very old clients (Ubuntu 14.04) that we can't easily
upgrade that use CephFS (we can probably upgrade to FUSE on these
nodes, but it will take a long time). A good portion of our clients
are Ubuntu 18.04 with a 5.3 kernel.

I did set up a for loop to deep scrub all the PGs in case it needed to
update some internal data structures as indicated from the `ceph pg
ls` command.
```
* NOTE: Omap statistics are gathered during deep scrub and may be
inaccurate soon afterwards depending on utilisation. See
http://docs.ceph.com/docs/master/dev/placement-group/#omap-statistics
for further details.
```
Which I thought may help get `ceph df` working properly so we know how
much storage is actually being used.

We do have crons running to get metrics into graphite and some of them
do ceph commands, others are creating/writing/reading/deleting files
to get some file system performance metrics but none doing scrubs,
rebalance, etc.

Thank you,
Robert LeBlanc
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx