wouldn't it be nice that ceph does something like this in background
(some sort of network-scrub). debugging network like this is not that
easy (can't expect admins to install e.g. perfsonar on all nodes and/or
clients)
something like: every X min, each service X pick a service Y on another
host (assuming X and Y will exchange some communication at some point;
like osd with other osd), send 1MB of data, and make the timing data
available so we can monitor it and detect underperforming links over time.
ideally clients also do this, but not sure where they should
report/store the data.
interpreting the data can be a bit tricky, but extreme outliers will be
spotted easily, and the main issue with this sort of debugging is
collecting the data.
simply reporting / keeping track of ongoing communications is already a
big step forward, but then we need to have the size of the exchanged
data to allow interpretation (and the timing should be about the network
part, not e.g. flush data to disk in case of an osd). (and obviously
sampling is enough, no need to have details of every bit send).
stijn
On 07/30/2015 08:04 PM, Mark Nelson wrote:
Thanks for posting this! We see issues like this more often than you'd
think. It's really important too because if you don't figure it out the
natural inclination is to blame Ceph! :)
Mark
On 07/30/2015 12:50 PM, Quentin Hartman wrote:
Just wanted to drop a note to the group that I had my cluster go
sideways yesterday, and the root of the problem was networking again.
Using iperf I discovered that one of my nodes was only moving data at
1.7Mb / s. Moving that node to a different switch port with a different
cable has resolved the problem. It took awhile to track down because
none of the server-side error metrics for disk or network showed
anything was amiss, and I didn't think to test network performance (as
suggested in another thread) until well into the process.
Check networking first!
QH
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com