>> If your public network is saturated, that actually is a problem, last thing you want is to add recovery traffic, or to slow down heartbeats. For most people, it isn’t saturated. > > See Frank Schilder's post about a meltdown which he believes could have > been caused by beacon/hearbeat being drowned out by other recovery/IO > trafic, not at the network level, but at the processing level on the OSDs. > > If indeed there are cases where the OSDs are too busy to send (or process) > heartbeat/beacon messaging, it wouldn't help to have a separate network ? Agreed. Many times I’ve had to argue that CPUs that aren’t nearly saturated *aren’t* necessarily overkill, especially with fast media where latency hurts. It would be interesting to consider an architecture where a core/HT is dedicated to the control plane. That said, I’ve seen a situation where excessive CPU appeared to affect latency by allowing the CPUs to drop C-states, this especially affected network traffic (2x dual 10GE). Curiously some systems in the same cluster experienced this but some didn’t. There was a mix of Sandy Bridge and Ivy Bridge IIRC, as well as different Broadcom chips. Despite an apparently alignment with older vs newer Broadcom chip, I never fully characterized the situation — replacing one of the Broadcom NICs in an affected system with the model in use on unaffected systems diddn’t resolve the issue. It’s possible that replacing the other wwould have made a difference. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx