On Fri, Jul 17, 2015 at 12:19 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote: > Maybe try some iperf tests between the different OSD nodes in your > cluster and also the client to the OSDs. This proved to be an excellent suggestion. One of these is not like the others: f16 inbound: 6Gbps f16 outbound: 6Gbps f17 inbound: 6Gbps f17 outbound: 6Gbps f18 inbound: 6Gbps f18 outbound: 1.2Mbps There is flatly no explanation for the outbound performance on f18. There are no errors in ifconfig/netstat, nothing logged on the switch, etc. Even with tcpdump running during iperf, there aren't retransmits or anything. It's just slow. ifconfig'ing the primary bond interface down immediately resolved the problem. The iostat running in the virtual machine immediately surged to 500+ IOPS and 40M-60M/sec. Weirdly, ifconfig'ing the primary device back up did not bring the problem back. It switched back to that interface, but everything is still fine (and iperf gives 6Gbps) at the moment. There's no way of telling if that will last, but it's a solid lead either way. It's an Intel onboard dual-port X540's using the ixgbe driver. If it were a driver problem, we've got tons of these so I'd expect to see this problem elsewhere. If it's a hardware problem, ifconfig down/up doesn't seem like it would "fix" it. Very mysterious! Thanks! _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com