On Mon, Jun 26, 2017 at 1:55 PM, John Spray <jspray@xxxxxxxxxx> wrote: > On Mon, Jun 26, 2017 at 6:21 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >> On Mon, Jun 26, 2017 at 12:56 PM, John Spray <jspray@xxxxxxxxxx> wrote: >>> On Mon, Jun 26, 2017 at 5:34 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >>>> On Mon, Jun 26, 2017 at 11:52 AM, John Spray <jspray@xxxxxxxxxx> wrote: >>>>> Hi guys, >>>>> >>>>> I was pondering this and wondered if you had any existing plans... >>>>> >>>>> For doing network testing between two remote nodes, we'll need to be >>>>> able to spin up some sort of listener on one end, presumably via SSH >>>>> from a third party node. >>>> >>>> What kind of testing warrants this type of setup? >>> >>> I'm talking about opening a TCP connection between two remote nodes to >>> verify that the network connectivity is working, and probably doing >>> this across a large set of pairs e.g. doing an all-to-all ping pong >>> between OSD nodes. Obviously, there is just the standard `ping`, but >>> I'm expecting that we'll want to test using actual TCP traffic in the >>> port ranges that the OSDs would use. >> >> If we are checking connectivity between OSD nodes, wouldn't it be >> sufficient to test >> if the port where the OSD is listening can be reached? Again, we can >> use plain Python to send actual TCP traffic here. > > I don't think we should assume a running OSD process for this -- > partly to enable someone to fully test their pre-Ceph configuration > before they install Ceph, but also because we would like to > distinguish between network issues and "OSD isn't > listening/responding" issues. Aha, important distinction: "pre Ceph" vs. "post Ceph". ceph-medic is currently a "post Ceph" tool. Not that it means that it can't (or shouldn't) have some kind of pre-flight checks. Those types of checks have to be lenient and approached fairly different, and I believe that it was discussed as something that will get implemented. > > John > >> I think that what you are proposing was going to be checked as part of >> the 'network' collection. Managing processes to listen to each >> other and report if they do/don't, sounds like it can be avoided. >> >> I have created a ticket to make sure that we collect the inter-node connectivity >> >> https://github.com/ceph/ceph-medic/issues/18 >> >> >>> >>> John >>> >>>>> >>>>> I guess the choice here is whether to depend on having ceph-medic >>>>> already installed on all the nodes (and invoke it with a special >>>>> --receiver type argument) or whether the tool should inject its code >>>>> over SSH (e.g. run a big fat python command line with a script in it >>>>> over SSH). >>>>> >>>> >>>> That is kind of how this works already, borrowing from ceph-deploy: it >>>> uses SSH to connect >>>> to remote nodes and execute either system commands or Python code. >>>> >>>> In what scenario using a system call or Python code will not gather >>>> enough information that a server/client >>>> setup would? >>>> >>>>> I lean towards the latter in the interests of making the deployment >>>>> simple, but I'm not sure what the story is with e.g. selinux in >>>>> situations like this, whether a server is going to get unhappy about >>>>> an SSH session that tries to open ports. >>>> >>>> Having two processes running to check connectivity sounds a bit >>>> complicated to handle. One of the things the tool does >>>> is to cross-check against other nodes in the system, so this would >>>> potentially mean running an exponential amount of >>>> processes: for every node to each node in the cluster. >>>> >>>> It will be cheaper to perform those checks with either plain Python or >>>> a system call. >>>> >>>> Or maybe you mean some other type of check? What are your ideas on >>>> "network testing" ? >>>>> >>>>> John -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html