Re: Nautilus 14.2.19 mon 100% CPU

Stefan Kooman <stefan@xxxxxx> · Thu, 8 Apr 2021 21:11:06 +0200

On 4/8/21 6:22 PM, Robert LeBlanc wrote:
I upgraded our Luminous cluster to Nautilus a couple of weeks ago and 
converted the last batch of FileStore OSDs to BlueStore about 36 hours 
ago. Yesterday our monitor cluster went nuts and started constantly 
calling elections because monitor nodes were at 100% and wouldn't 
respond to heartbeats. I reduced the monitor cluster to one to prevent 
the constant elections and that let the system limp along until the 
backfills finished. There are large amounts of time where ceph commands 
hang with the CPU is at 100%, when the CPU drops I see a lot of work 
getting done in the monitor logs which stops as soon as the CPU is at 
100% again.

Try reducing mon_sync_max_payload_size=4096. I have seen Frank Schilder 
advise this several times because of monitor issues. Also recently for a 
cluster that got upgraded from Luminous -> Mimic -> Nautilus.

Worth a shot.

Otherwise I'll try to look in depth and see if I can come up with 
something smart (for now I need to go catch some sleep).

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx