Re: OSD rebind connects to ports of other OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20-12-2016 19:43, Sage Weil wrote:
> On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
>> On 20-12-2016 16:23, Sage Weil wrote:
>>> On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
>>>> On 20-12-2016 11:21, Willem Jan Withagen wrote:
>>>>> Hi,
>>>>>
>>>>> I've been banging my head against the wall for some time now.
>>>>> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
>>>>>
>>>>> When rebinding it connects to the ports of OSD.1 because those ports are
>>>>> the first not in the avoid_list. That should be refused since these
>>>>> sockets belong to a different process.
>>>>> UNLESS SO_REUSEPORT is set:
>>>>>  SO_REUSEPORT allows completely duplicate bindings by multiple processes
>>>>>  if they all set SO_REUSEPORT before binding the port.  This option
>>>>>  permits multiple instances of a program to each receive UDP/IP
>>>>>  multicast or broadcast datagrams destined for the bound port.
>>>>>
>>>>> Which seems that that happens.
>>>>> Output from sockstat in this state:
>>>>> wjw      ceph-osd-0   43305 14 tcp4   *:6800                *:*
>>>>> wjw      ceph-osd-0   43305 15 tcp4   127.0.0.1:6804        *:*
>>>>> wjw      ceph-osd-0   43305 16 tcp4   127.0.0.1:6805        *:*
>>>>> wjw      ceph-osd-0   43305 45 tcp4   127.0.0.1:6806        *:*
>>>>> wjw      ceph-osd-1   43318 14 tcp4   *:6804                *:*
>>>>> wjw      ceph-osd-1   43318 15 tcp4   *:6805                *:*
>>>>> wjw      ceph-osd-1   43318 16 tcp4   *:6806                *:*
>>>>> wjw      ceph-osd-1   43318 17 tcp4   *:6807                *:*
>>>>>
>>>>> Which clearly demonstrates the mess.
>>>>> How ever that option is nowhere set in the ceph-code, neither is it a
>>>>> setting that "just" gets set.
>>>>>
>>>>> Any suggestions where to look for this option to get set in an
>>>>> incidental/bug way would be much appreciated.
>>>>> Or a suggestion on how to easily debug this.
>>>>
>>>> Right,
>>>>
>>>> Compatibility in this area is rather thin. :)
>>>>
>>>> For the question skip to the end.
>>>>
>>>> So I'm going to need some functional description, to see if I can get it
>>>> right.
>>>>
>>>> Osd starts and build a few messengers with SO_REUSEADDR on the socket.
>>>>         On Linux used ports are being reported in use.
>>>> 	As on FreeBSD during startup. Ports are nicely iterated thru
>>>> 	and sequential ports are selected.
>>>> So that is how it should be.
>>>>
>>>> Now when the osd has gone down and comes up, it reports:
>>>>   log_channel(cluster) log [WRN] : map e18 wrongly marked me down
>>>> on ./src/osd/OSD.cc:7120
>>>>
>>>> Then it starts rebinding on its messenger connections:
>>>>         int r = cluster_messenger->rebind(avoid_ports)
>>>> on ./src/osd/OSD.cc:7192.
>>>> It calls shutdown_connections() to shutdown all of its connections.
>>>>
>>>> Somewhere down the line is SO_REUSEADDR set again on the socket and the
>>>> socket is bound.
>>>>  - Linux grabs the next available ports at the end, because its own
>>>>    channels are to be avoided and the rest is taken.
>>>>
>>>>  - On FreeBSD the first port available is taken. If that is 6800,
>>>>    than that is taken. Even if the socket is owned by a different
>>>>    process. Which (per man-page) would require SO_REUSEPORT.
>>>>
>>>> If I disable SO_REUSEADDR in NetHandler::create_socket()
>>>> ====
>>>>   /* Make sure connection-intensive things like the benchmark
>>>>    * will be able to close/open sockets a zillion of times */
>>>>   if (reuse_addr) {
>>>>     if (::setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on,sizeof(on))==-1){
>>>>       lderr(cct) << __func__ << " setsockopt SO_REUSEADDR failed: "
>>>>                  << strerror(errno) << dendl;
>>>>       close(s);
>>>>       return -errno;
>>>>     }
>>>>   }
>>>> ====
>>>> Then things start to work "as expected" and ports are refused when it
>>>> has a listener connected.
>>>>
>>>> Doing this has the disadvantage that it is not possible to immediately
>>>> kill and restart the OSD because the ports are not yet release in the
>>>> netstat table.... But that is an overseeable issue, and that time can be
>>>> shorted by setting a sysctl.
>>>>
>>>> So the question is:
>>>>  - how much rebinding is required.....
>>>
>>> I think it's just for tests.  My recollection is that we did this just 
>>> because we can run out of ports since we can't reuse one until the tcp 
>>> finwait2 (or whatever) timeout expires.
>>>
>>>>  - And why do we set SO_REUSEADDR if we are going to add the ports to
>>>>  	avoid_ports. And thus a complete new port is required.
>>>
>>> I suspect it's safe to drop the option if the Linux vs FreeBSD semantics 
>>> are in fact different.
>>
>> When I exclude the SO_REUSEADDR my Jenkins goes back to normal.
>> Will submit a PR.
> 
> Please #ifdef it so it's only excluded for FreeBSD.

Yup,

in #12593

--WjW


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux