On 07/31/2015 05:21 AM, John Spray wrote:
On 31/07/15 06:27, Stijn De Weirdt wrote:
wouldn't it be nice that ceph does something like this in background
(some sort of network-scrub). debugging network like this is not that
easy (can't expect admins to install e.g. perfsonar on all nodes
and/or clients)
something like: every X min, each service X pick a service Y on
another host (assuming X and Y will exchange some communication at
some point; like osd with other osd), send 1MB of data, and make the
timing data available so we can monitor it and detect underperforming
links over time.
ideally clients also do this, but not sure where they should
report/store the data.
interpreting the data can be a bit tricky, but extreme outliers will
be spotted easily, and the main issue with this sort of debugging is
collecting the data.
simply reporting / keeping track of ongoing communications is already
a big step forward, but then we need to have the size of the exchanged
data to allow interpretation (and the timing should be about the
network part, not e.g. flush data to disk in case of an osd). (and
obviously sampling is enough, no need to have details of every bit send).
Yes, it's a reasonable concept, although it's not clear that we'd
necessarily want it built into existing ceph services. For example,
where there are several OSDs running on a host, we don't really want all
the OSDs redundantly verifying that particular host's network
functionality. This use case is a pretty good argument for a ceph
supervisor service of some kind that exists on a one-per-host basis. The
trick is finding someone with time to write it :-)
The prior art here is Lustres LNET self test (LST) which exists for
exactly these reasons (Mark will have memories of this too I'm sure).
Haha, yes! So I think something written to use the messenger might be
the right way to go. I'll probably build all-to-all and one-to-all
tests using iperf into CBT at some point as a hold over since I've got a
couple of simple scripts that do that already. It'd probably be fairly
easy to implement.
John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com