We are currently running 3 MONs. When one goes into silly town the others get wedged and won't respond well. I don't think more MONs would solve that... but I'm not sure. -- Paul Mezzanini Sr Systems Administrator / Engineer, Research Computing Information & Technology Services Finance & Administration Rochester Institute of Technology o:(585) 475-3245 | pfmeec@xxxxxxx CONFIDENTIALITY NOTE: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. ------------------------ ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: Friday, January 29, 2021 12:58 PM To: Paul Mezzanini; ceph-users@xxxxxxx Subject: Re: OSDs cannot join, MON leader at 100% Hi Poul, thanks for sharing. I have the MONs on 2x10G bonded active-active. They don't manage to saturate 10G, but the CPU core is overloaded. How many MONs do you have? I believe I have never seen more than 2 to be in this state for an extended period of time. My plan is to go from 3 to 5, which would leave a subcluster of 3 and I would be less hesitant to restart an affected MON right away. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Paul Mezzanini <pfmeec@xxxxxxx> Sent: 29 January 2021 17:44:42 To: Frank Schilder; ceph-users@xxxxxxx Subject: Re: OSDs cannot join, MON leader at 100% We've been watching our MONs go unresponsive with a saturated 10GbE NIC. The problem seems to be aggravated by peering. We were shrinking the PG count on one of our large pools and it was happening a bunch. Once that finished it seemed to calm down. Yesterday I had an OSD go down and as it was rebalancing we had another MON go into silly mode. We recover from this situation by just restarting the MON process on the hung node. We are running 14.2.15. I wish I could tell you what the problem actually is and how to fix it. At least we aren't alone in this failure mode. -- Paul Mezzanini Sr Systems Administrator / Engineer, Research Computing Information & Technology Services Finance & Administration Rochester Institute of Technology o:(585) 475-3245 | pfmeec@xxxxxxx CONFIDENTIALITY NOTE: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. ------------------------ ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: Friday, January 29, 2021 5:22 AM To: ceph-users@xxxxxxx Subject: OSDs cannot join, MON leader at 100% Dear cephers, I was doing some maintenance yesterday involving shutdown-power up cycles of ceph servers. With the last server I run into a problem. The server runs an MDS and a couple of OSDs. After reboot, the MDS joined the MDS cluster without problems, but the OSDs didn't come up. This was 1 out of 12 servers and I had no such problems with the other 11. I also observed that "ceph status" was responding very slow. Upon further inspection, I found out that 2 of my 3 MONs (the leader and a peon) were running at 100% CPU. Client I/O was continuing, probably because the last cluster map remained valid. On our node performance monitoring I could see that the 2 busy MONs were showing extraordinary network activity. This state lasted for over one hour. After the MONs settled down, the OSDs finally joined as well and everything went back to normal. The other instance I have seen similar behaviour was, when I restarted a MON on an empty disk and the re-sync was extremely slow due to a too large value for mon_sync_max_payload_size. This time, I'm pretty sure it was MON-client communication; see below. Are there any settings similar to mon_sync_max_payload_size that could influence responsiveness of MONs in a similar way? Why do I suspect it is MON-client communication? In our monitoring, I do not see the huge amount of packages sent by the MONs arriving at any other ceph daemon. They seem to be distributed over client nodes, but since we have a large count of client nodes (>550) this is covered by the background network traffic. A second clue is that I have had such extended lock-ups before and, whenever I checked, I only observed these in case the leader had a large share of client sessions. For example, yesterday the client session count per MON was: ceph-01: 1339 (leader) ceph-02: 189 (peon) ceph-03: 839 (peon) I usually restart the leader when such a critical distribution occurs. As long as the leader has the fewest client sessions, I never observe this problem. Ceph version is 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable). Thanks for any clues! Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx