On Mon, Jun 26, 2017 at 6:21 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: > On Mon, Jun 26, 2017 at 12:56 PM, John Spray <jspray@xxxxxxxxxx> wrote: >> On Mon, Jun 26, 2017 at 5:34 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >>> On Mon, Jun 26, 2017 at 11:52 AM, John Spray <jspray@xxxxxxxxxx> wrote: >>>> Hi guys, >>>> >>>> I was pondering this and wondered if you had any existing plans... >>>> >>>> For doing network testing between two remote nodes, we'll need to be >>>> able to spin up some sort of listener on one end, presumably via SSH >>>> from a third party node. >>> >>> What kind of testing warrants this type of setup? >> >> I'm talking about opening a TCP connection between two remote nodes to >> verify that the network connectivity is working, and probably doing >> this across a large set of pairs e.g. doing an all-to-all ping pong >> between OSD nodes. Obviously, there is just the standard `ping`, but >> I'm expecting that we'll want to test using actual TCP traffic in the >> port ranges that the OSDs would use. > > If we are checking connectivity between OSD nodes, wouldn't it be > sufficient to test > if the port where the OSD is listening can be reached? Again, we can > use plain Python to send actual TCP traffic here. I don't think we should assume a running OSD process for this -- partly to enable someone to fully test their pre-Ceph configuration before they install Ceph, but also because we would like to distinguish between network issues and "OSD isn't listening/responding" issues. John > I think that what you are proposing was going to be checked as part of > the 'network' collection. Managing processes to listen to each > other and report if they do/don't, sounds like it can be avoided. > > I have created a ticket to make sure that we collect the inter-node connectivity > > https://github.com/ceph/ceph-medic/issues/18 > > >> >> John >> >>>> >>>> I guess the choice here is whether to depend on having ceph-medic >>>> already installed on all the nodes (and invoke it with a special >>>> --receiver type argument) or whether the tool should inject its code >>>> over SSH (e.g. run a big fat python command line with a script in it >>>> over SSH). >>>> >>> >>> That is kind of how this works already, borrowing from ceph-deploy: it >>> uses SSH to connect >>> to remote nodes and execute either system commands or Python code. >>> >>> In what scenario using a system call or Python code will not gather >>> enough information that a server/client >>> setup would? >>> >>>> I lean towards the latter in the interests of making the deployment >>>> simple, but I'm not sure what the story is with e.g. selinux in >>>> situations like this, whether a server is going to get unhappy about >>>> an SSH session that tries to open ports. >>> >>> Having two processes running to check connectivity sounds a bit >>> complicated to handle. One of the things the tool does >>> is to cross-check against other nodes in the system, so this would >>> potentially mean running an exponential amount of >>> processes: for every node to each node in the cluster. >>> >>> It will be cheaper to perform those checks with either plain Python or >>> a system call. >>> >>> Or maybe you mean some other type of check? What are your ideas on >>> "network testing" ? >>>> >>>> John -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html