On Fri, Apr 9, 2021 at 7:24 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > > On Fri, Apr 9, 2021 at 11:05 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > > Hi Robert, > > > > Have you checked a log with debug_mon=20 yet to try to see what it's doing? > > > I've posted the logs with debug_mon=20 for a period during high CPU > here https://owncloud.leblancnet.us/owncloud/index.php/s/OtHsBAYN9r5eSbU > > You can look near the end of the log for the verbose logging. I'm not > sure what to look for in there, nothing sticks out to me. I did > disable cephx in the config file to see if that would help, but we > still have the 100% CPU. > Thanks. I didn't see anything ultra obvious to me. But I did notice the nearfull warnings so I wonder if this cluster is churning through osdmaps? Did you see a large increase in inbound or outbound network traffic on this mon following the upgrade? Totally speculating here, but maybe there is an issue where you have some old clients, which can't decode an incremental osdmap from a nautilus mon, so the single mon is busy serving up these maps to the clients. Does the mon load decrease if you stop the osdmap churn?, e.g. by setting norebalance if that is indeed ongoing. Could you also share debug_ms = 1 for a minute of busy cpu mon? -- dan > Thank you, > Robert LeBlanc _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx