Re: Slow OSD startup and slow ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 9/21/22 18:00, Gauvain Pocentek wrote:
Hello all,

We are running several Ceph clusters and are facing an issue on one of
them, we would appreciate some input on the problems we're seeing.

We run Ceph in containers on Centos Stream 8, and we deploy using
ceph-ansible. While upgrading ceph from 16.2.7 to 16.2.10, we noticed that
OSDs were taking a very long time to restart on one of the clusters. (Other
clusters were not impacted at all.)

Are the other clusters of similar size?


The OSD startup was so slow sometimes
that we ended up having slow ops, with 1 or 2 pg stuck in a peering state.
We've interrupted the upgrade and the cluster runs fine now, although we
have seen 1 OSD flapping recently, having trouble coming back to life.

We've checked a lot of things and read a lot of mails from this list, and
here are some info:

* this cluster has RBD pools for OpenStack and RGW pools; everything is
replicated x 3, except the RGW data pool which is EC 4+2
* we haven't found any hardware related issues; we run fully on SSDs and
they are all in good shape, no network issue, RAM and CPU are available on
all OSD hosts
* bluestore with an LVM collocated setup
* we have seen the slow restart with almost all the OSDs we've upgraded
(100 out of 350)
* on restart the ceph-osd process runs at 100% CPU but we haven't seen
anything weird on the host

Are the containers restricted to use a certain amount of CPU? Do the OSDs, after ~ 10-20 seconds increase their CPU usage to 200% (if so this is proably because of rocksdb option max_background_compactions = 2).

* no DB spillover
* we have other clusters with the same hardware, and we don't see problems
there

The only thing that we found that looks suspicious is the number of op logs
for the PGs of the RGW index pool. `osd_max_pg_log_entries` is set to 10k
but `ceph pg dump` show PGs with more than 100k logs (the largest one has >
400k logs).

Could this be the reason for the slow startup of OSDs? If so is there a way
to trim these logs without too much impact on the cluster?

Not sure. We have ~ 2K logs per PG.


Let me know if additional info or logs are needed.

Do you have a log of slow ops and osd logs?

Do you have any non-standard configuration for the daemons? I.e. ceph daemon osd.$id config diff

We are running a Ceph Octopus (15.2.16) cluster with similar configuration. We have *a lot* of slow ops when starting OSDs. Also during peering. When the OSDs start they consume 100% CPU for up to ~ 10 seconds, and after that consume 200% for a minute or more. During that time the OSDs perform a compaction. You should be able to find this in the OSD logs if it's the same in your case. After some the OSDs are done initializing and starting the boot process. As soon as they boot up and start peering the slow ops start to kick in. Lot's of "transitioning to Primary" and "transitioning to Stray" logging. Some time later the OSD becomes "active". While the OSD is busy with peering it's also busy compacting. As I also see RocksDB compaction logging. So it might be due to RocksDB compactions impacting OSD performance while it's already busy becoming primary (and or secondary / tertiary) for it's PGs.

We had norecover, nobackfill, norebalance active when booting the OSDs.

So, it might just take a long time to do RocksDB compaction. In this case it might be better to do all needed RocksDB compactions, and then start booting. So, what might help is to set "ceph osd set noup". This prevents the OSD from becoming active, then wait for the RocksDB compactions, and after that unset the flag.

If you try this, please let me know how it goes.

Gr. Stefan





_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux