Re: Check networking first?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Even just a ping at max MTU set with nodefrag could tell a lot about
connectivity issues and latency without a lot of traffic. Using Ceph
messenger would be even better to check firewall ports. I like the
idea of incorporating simple network checks into Ceph. The monitor can
correlate failures and help determine if the problem is related to one
host from the CRUSH map.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Jul 30, 2015 at 11:27 PM, Stijn De Weirdt  wrote:
> wouldn't it be nice that ceph does something like this in background (some
> sort of network-scrub). debugging network like this is not that easy (can't
> expect admins to install e.g. perfsonar on all nodes and/or clients)
>
> something like: every X min, each service X pick a service Y on another host
> (assuming X and Y will exchange some communication at some point; like osd
> with other osd), send 1MB of data, and make the timing data available so we
> can monitor it and detect underperforming links over time.
>
> ideally clients also do this, but not sure where they should report/store
> the data.
>
> interpreting the data can be a bit tricky, but extreme outliers will be
> spotted easily, and the main issue with this sort of debugging is collecting
> the data.
>
> simply reporting / keeping track of ongoing communications is already a big
> step forward, but then we need to have the size of the exchanged data to
> allow interpretation (and the timing should be about the network part, not
> e.g. flush data to disk in case of an osd). (and obviously sampling is
> enough, no need to have details of every bit send).
>
>
>
> stijn
>
>
> On 07/30/2015 08:04 PM, Mark Nelson wrote:
>>
>> Thanks for posting this!  We see issues like this more often than you'd
>> think.  It's really important too because if you don't figure it out the
>> natural inclination is to blame Ceph! :)
>>
>> Mark
>>
>> On 07/30/2015 12:50 PM, Quentin Hartman wrote:
>>>
>>> Just wanted to drop a note to the group that I had my cluster go
>>> sideways yesterday, and the root of the problem was networking again.
>>> Using iperf I discovered that one of my nodes was only moving data at
>>> 1.7Mb / s. Moving that node to a different switch port with a different
>>> cable has resolved the problem. It took awhile to track down because
>>> none of the server-side error metrics for disk or network showed
>>> anything was amiss, and I didn't think to test network performance (as
>>> suggested in another thread) until well into the process.
>>>
>>> Check networking first!
>>>
>>> QH
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVu7QoCRDmVDuy+mK58QAAcpAQAKbv6xPRxMMJ8NWrXym0
NAtZFIYywvStKfTG2pL1xjb2p/xDM+6Z5mnYJTBHb+0dkGIO6qe0jF9t4XEE
ppH+55eIpkCZrKMdfN1L0vUe9ldFnJS2jsAlGkvzyRLJale++q1evymIAaWb
JnEZgV3pGrPTCRaVKNrT3NaGZVDLm6ygnsT6PYJaiXM8Av3equ00Uls2/i6v
vZhlIBz5TbKsNag/W7cRJVvjj7YDsgU+dplDl62mmDJ6o+cWvILlf9WPINdV
MrmIeg+7fqUEp8nuEzTMm+BDHQ3c/5cxrYr8bksiVoBTXV7m9fO0Je9Exn6N
iWTa5eDUBtR6Ha8WaVUib/cvFj6j94QRNWYmXHl9lG50p+XZ0L5bZ1G8v9Nb
gGxRoYgAncp9M1J+7Pvm5z8wZgxXAs/veUtrf+6SkUbGyCRnUSn/VS7C8syJ
4WW2aWP/A0nxSDe1u+TGpkkPmhk7UDrJEfMQaZrFwS9FkFLfgLH7PxMcAZjJ
hlN129vldPh3QxLviLidlJmzUTvKtb+XrSkA0MjhFMJS2M79DR16j+XWe7Ub
wPnKpZcZ8WsQzOlTHtDEHQvhE3ilcm+4oALSiuqEAZKNKk8lUTtvfzJ2BKyu
Tv46c+Wf3LbwrdMnkGiMHLuIlqhQT2FzauM2Pi+Pt7QJ7L9xXfWW4vzdemxj
bBQD
=rPC0
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux