On 1/10/17, 5:35 AM, "John Spray" <jspray@xxxxxxxxxx> wrote: >On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J ><Bryan.Stillwell@xxxxxxxxxxx> wrote: >> Last week I decided to play around with Kraken (11.1.1-1xenial) on a >> single node, two OSD cluster, and after a while I noticed that the new >> ceph-mgr daemon is frequently using a lot of the CPU: >> >> 17519 ceph 20 0 850044 168104 208 S 102.7 4.3 1278:27 >> ceph-mgr >> >> Restarting it with 'systemctl restart ceph-mgr*' seems to get its CPU >> usage down to < 1%, but after a while it climbs back up to > 100%. Has >> anyone else seen this? > >Definitely worth investigating, could you set "debug mgr = 20" on the >daemon to see if it's obviously spinning in a particular place? I've injected that option to the ceps-mgr process, and now I'm just waiting for it to go out of control again. However, I've noticed quite a few messages like this in the logs already: 2017-01-10 09:56:07.441678 7f70f4562700 0 -- 172.24.88.207:6800/4104 >> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN pgs=2 cs=1 l=0).fault initiating reconnect 2017-01-10 09:56:07.442044 7f70f4562700 0 -- 172.24.88.207:6800/4104 >> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING 2017-01-10 09:56:07.442067 7f70f4562700 0 -- 172.24.88.207:6800/4104 >> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing 2017-01-10 09:56:07.443026 7f70f4562700 0 -- 172.24.88.207:6800/4104 >> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing to send and in the half accept state just closed What's weird about that is that this is a single node cluster with ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same host. So none of the communication should be leaving the node. Bryan E-MAIL CONFIDENTIALITY NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com