Forgot to mention: all nodes are on Luminous 12.2.8 currently on CentOS 7.5.
On 12/19/18 5:34 PM, Andras Pataki wrote:
Dear ceph users,
We have a large-ish ceph cluster with about 3500 osds. We run 3 mons
on dedicated hosts, and the mons typically use a few percent of a
core, and generate about 50Mbits/sec network traffic. They are
connected at 20Gbits/sec (bonded dual 10Gbit) and are running on 2x14
core servers.
We recently had to shut ceph down completely for maintenance (which we
rarely do), and had significant difficulties starting it up. The
symptoms included OSDs hanging on startup, being marked down, flapping
and all that bad stuff. After some investigation we found that the
20Gbit/sec network interfaces of the monitors were completely
saturated as the OSDs were starting, while the monitor processes were
using about 3 cores (300% CPU). We ended up having to start the OSDs
up super slow to make sure that the monitors could keep up - it took
about 4 hours to start 3500 OSDs (at a rate about 4 seconds per OSD).
We've tried setting noout and nodown, but that didn't really help
either. A few questions that would be good to understand in order to
move to a better configuration.
1. How does the monitor traffic scale with the number of OSDs?
Presumably the traffic comes from distributing cluster maps as the
cluster changes on OSD starts. The cluster map is perhaps O(N) for N
OSDs, and each OSD needs an update on a cluster change so that would
make one change an O(N^2) traffic. As OSDs start, the cluster changes
quite a lot (N times?), so would that make the startup traffic
O(N^3)? If so, that sounds pretty scary for scalability.
2. Would adding more monitors help here? I.e. presumably each OSD
gets its maps from one monitor, so they would share the traffic. Would
the inter-monitor communication/elections/etc. be problematic for more
monitors (5, 7 or even more)? Would more monitors be recommended? If
so, how many is practical?
3. Are there any config parameters useful for tuning the traffic
(perhaps send mon updates less frequently, or something along those
lines)?
Any other advice on this topic would also be helpful.
Thanks,
Andras
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com