Re: Remote execution in ceph-medic

John Spray <jspray@xxxxxxxxxx> · Mon, 26 Jun 2017 18:55:19 +0100

On Mon, Jun 26, 2017 at 6:21 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
> On Mon, Jun 26, 2017 at 12:56 PM, John Spray <jspray@xxxxxxxxxx> wrote:
>> On Mon, Jun 26, 2017 at 5:34 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>>> On Mon, Jun 26, 2017 at 11:52 AM, John Spray <jspray@xxxxxxxxxx> wrote:
>>>> Hi guys,
>>>>
>>>> I was pondering this and wondered if you had any existing plans...
>>>>
>>>> For doing network testing between two remote nodes, we'll need to be
>>>> able to spin up some sort of listener on one end, presumably via SSH
>>>> from a third party node.
>>>
>>> What kind of testing warrants this type of setup?
>>
>> I'm talking about opening a TCP connection between two remote nodes to
>> verify that the network connectivity is working, and probably doing
>> this across a large set of pairs e.g. doing an all-to-all ping pong
>> between OSD nodes.  Obviously, there is just the standard `ping`, but
>> I'm expecting that we'll want to test using actual TCP traffic in the
>> port ranges that the OSDs would use.
>
> If we are checking connectivity between OSD nodes, wouldn't it be
> sufficient to test
> if the port where the OSD is listening can be reached? Again, we can
> use plain Python to send actual TCP traffic here.

I don't think we should assume a running OSD process for this --
partly to enable someone to fully test their pre-Ceph configuration
before they install Ceph, but also because we would like to
distinguish between network issues and "OSD isn't
listening/responding" issues.

John

> I think that what you are proposing was going to be checked as part of
> the 'network' collection. Managing processes to listen to each
> other and report if they do/don't, sounds like it can be avoided.
>
> I have created a ticket to make sure that we collect the inter-node connectivity
>
>     https://github.com/ceph/ceph-medic/issues/18
>
>
>>
>> John
>>
>>>>
>>>> I guess the choice here is whether to depend on having ceph-medic
>>>> already installed on all the nodes (and invoke it with a special
>>>> --receiver type argument) or whether the tool should inject its code
>>>> over SSH (e.g. run a big fat python command line with a script in it
>>>> over SSH).
>>>>
>>>
>>> That is kind of how this works already, borrowing from ceph-deploy: it
>>> uses SSH to connect
>>> to remote nodes and execute either system commands or Python code.
>>>
>>> In what scenario using a system call or Python code will not gather
>>> enough information that a server/client
>>> setup would?
>>>
>>>> I lean towards the latter in the interests of making the deployment
>>>> simple, but I'm not sure what the story is with e.g. selinux in
>>>> situations like this, whether a server is going to get unhappy about
>>>> an SSH session that tries to open ports.
>>>
>>> Having two processes running to check connectivity sounds a bit
>>> complicated to handle. One of the things the tool does
>>> is to cross-check against other nodes in the system, so this would
>>> potentially mean running an exponential amount of
>>> processes: for every node to each node in the cluster.
>>>
>>> It will be cheaper to perform those checks with either plain Python or
>>> a system call.
>>>
>>> Or maybe you mean some other type of check? What are your ideas on
>>> "network testing" ?
>>>>
>>>> John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html