Re: OSD rebind connects to ports of other OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
> On 20-12-2016 16:23, Sage Weil wrote:
> > On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
> >> On 20-12-2016 11:21, Willem Jan Withagen wrote:
> >>> Hi,
> >>>
> >>> I've been banging my head against the wall for some time now.
> >>> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
> >>>
> >>> When rebinding it connects to the ports of OSD.1 because those ports are
> >>> the first not in the avoid_list. That should be refused since these
> >>> sockets belong to a different process.
> >>> UNLESS SO_REUSEPORT is set:
> >>>  SO_REUSEPORT allows completely duplicate bindings by multiple processes
> >>>  if they all set SO_REUSEPORT before binding the port.  This option
> >>>  permits multiple instances of a program to each receive UDP/IP
> >>>  multicast or broadcast datagrams destined for the bound port.
> >>>
> >>> Which seems that that happens.
> >>> Output from sockstat in this state:
> >>> wjw      ceph-osd-0   43305 14 tcp4   *:6800                *:*
> >>> wjw      ceph-osd-0   43305 15 tcp4   127.0.0.1:6804        *:*
> >>> wjw      ceph-osd-0   43305 16 tcp4   127.0.0.1:6805        *:*
> >>> wjw      ceph-osd-0   43305 45 tcp4   127.0.0.1:6806        *:*
> >>> wjw      ceph-osd-1   43318 14 tcp4   *:6804                *:*
> >>> wjw      ceph-osd-1   43318 15 tcp4   *:6805                *:*
> >>> wjw      ceph-osd-1   43318 16 tcp4   *:6806                *:*
> >>> wjw      ceph-osd-1   43318 17 tcp4   *:6807                *:*
> >>>
> >>> Which clearly demonstrates the mess.
> >>> How ever that option is nowhere set in the ceph-code, neither is it a
> >>> setting that "just" gets set.
> >>>
> >>> Any suggestions where to look for this option to get set in an
> >>> incidental/bug way would be much appreciated.
> >>> Or a suggestion on how to easily debug this.
> >>
> >> Right,
> >>
> >> Compatibility in this area is rather thin. :)
> >>
> >> For the question skip to the end.
> >>
> >> So I'm going to need some functional description, to see if I can get it
> >> right.
> >>
> >> Osd starts and build a few messengers with SO_REUSEADDR on the socket.
> >>         On Linux used ports are being reported in use.
> >> 	As on FreeBSD during startup. Ports are nicely iterated thru
> >> 	and sequential ports are selected.
> >> So that is how it should be.
> >>
> >> Now when the osd has gone down and comes up, it reports:
> >>   log_channel(cluster) log [WRN] : map e18 wrongly marked me down
> >> on ./src/osd/OSD.cc:7120
> >>
> >> Then it starts rebinding on its messenger connections:
> >>         int r = cluster_messenger->rebind(avoid_ports)
> >> on ./src/osd/OSD.cc:7192.
> >> It calls shutdown_connections() to shutdown all of its connections.
> >>
> >> Somewhere down the line is SO_REUSEADDR set again on the socket and the
> >> socket is bound.
> >>  - Linux grabs the next available ports at the end, because its own
> >>    channels are to be avoided and the rest is taken.
> >>
> >>  - On FreeBSD the first port available is taken. If that is 6800,
> >>    than that is taken. Even if the socket is owned by a different
> >>    process. Which (per man-page) would require SO_REUSEPORT.
> >>
> >> If I disable SO_REUSEADDR in NetHandler::create_socket()
> >> ====
> >>   /* Make sure connection-intensive things like the benchmark
> >>    * will be able to close/open sockets a zillion of times */
> >>   if (reuse_addr) {
> >>     if (::setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on,sizeof(on))==-1){
> >>       lderr(cct) << __func__ << " setsockopt SO_REUSEADDR failed: "
> >>                  << strerror(errno) << dendl;
> >>       close(s);
> >>       return -errno;
> >>     }
> >>   }
> >> ====
> >> Then things start to work "as expected" and ports are refused when it
> >> has a listener connected.
> >>
> >> Doing this has the disadvantage that it is not possible to immediately
> >> kill and restart the OSD because the ports are not yet release in the
> >> netstat table.... But that is an overseeable issue, and that time can be
> >> shorted by setting a sysctl.
> >>
> >> So the question is:
> >>  - how much rebinding is required.....
> > 
> > I think it's just for tests.  My recollection is that we did this just 
> > because we can run out of ports since we can't reuse one until the tcp 
> > finwait2 (or whatever) timeout expires.
> > 
> >>  - And why do we set SO_REUSEADDR if we are going to add the ports to
> >>  	avoid_ports. And thus a complete new port is required.
> > 
> > I suspect it's safe to drop the option if the Linux vs FreeBSD semantics 
> > are in fact different.
> 
> When I exclude the SO_REUSEADDR my Jenkins goes back to normal.
> Will submit a PR.

Please #ifdef it so it's only excluded for FreeBSD.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux