On 20-12-2016 19:43, Sage Weil wrote: > On Tue, 20 Dec 2016, Willem Jan Withagen wrote: >> On 20-12-2016 16:23, Sage Weil wrote: >>> On Tue, 20 Dec 2016, Willem Jan Withagen wrote: >>>> On 20-12-2016 11:21, Willem Jan Withagen wrote: >>>>> Hi, >>>>> >>>>> I've been banging my head against the wall for some time now. >>>>> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work. >>>>> >>>>> When rebinding it connects to the ports of OSD.1 because those ports are >>>>> the first not in the avoid_list. That should be refused since these >>>>> sockets belong to a different process. >>>>> UNLESS SO_REUSEPORT is set: >>>>> SO_REUSEPORT allows completely duplicate bindings by multiple processes >>>>> if they all set SO_REUSEPORT before binding the port. This option >>>>> permits multiple instances of a program to each receive UDP/IP >>>>> multicast or broadcast datagrams destined for the bound port. >>>>> >>>>> Which seems that that happens. >>>>> Output from sockstat in this state: >>>>> wjw ceph-osd-0 43305 14 tcp4 *:6800 *:* >>>>> wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:* >>>>> wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:* >>>>> wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:* >>>>> wjw ceph-osd-1 43318 14 tcp4 *:6804 *:* >>>>> wjw ceph-osd-1 43318 15 tcp4 *:6805 *:* >>>>> wjw ceph-osd-1 43318 16 tcp4 *:6806 *:* >>>>> wjw ceph-osd-1 43318 17 tcp4 *:6807 *:* >>>>> >>>>> Which clearly demonstrates the mess. >>>>> How ever that option is nowhere set in the ceph-code, neither is it a >>>>> setting that "just" gets set. >>>>> >>>>> Any suggestions where to look for this option to get set in an >>>>> incidental/bug way would be much appreciated. >>>>> Or a suggestion on how to easily debug this. >>>> >>>> Right, >>>> >>>> Compatibility in this area is rather thin. :) >>>> >>>> For the question skip to the end. >>>> >>>> So I'm going to need some functional description, to see if I can get it >>>> right. >>>> >>>> Osd starts and build a few messengers with SO_REUSEADDR on the socket. >>>> On Linux used ports are being reported in use. >>>> As on FreeBSD during startup. Ports are nicely iterated thru >>>> and sequential ports are selected. >>>> So that is how it should be. >>>> >>>> Now when the osd has gone down and comes up, it reports: >>>> log_channel(cluster) log [WRN] : map e18 wrongly marked me down >>>> on ./src/osd/OSD.cc:7120 >>>> >>>> Then it starts rebinding on its messenger connections: >>>> int r = cluster_messenger->rebind(avoid_ports) >>>> on ./src/osd/OSD.cc:7192. >>>> It calls shutdown_connections() to shutdown all of its connections. >>>> >>>> Somewhere down the line is SO_REUSEADDR set again on the socket and the >>>> socket is bound. >>>> - Linux grabs the next available ports at the end, because its own >>>> channels are to be avoided and the rest is taken. >>>> >>>> - On FreeBSD the first port available is taken. If that is 6800, >>>> than that is taken. Even if the socket is owned by a different >>>> process. Which (per man-page) would require SO_REUSEPORT. >>>> >>>> If I disable SO_REUSEADDR in NetHandler::create_socket() >>>> ==== >>>> /* Make sure connection-intensive things like the benchmark >>>> * will be able to close/open sockets a zillion of times */ >>>> if (reuse_addr) { >>>> if (::setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on,sizeof(on))==-1){ >>>> lderr(cct) << __func__ << " setsockopt SO_REUSEADDR failed: " >>>> << strerror(errno) << dendl; >>>> close(s); >>>> return -errno; >>>> } >>>> } >>>> ==== >>>> Then things start to work "as expected" and ports are refused when it >>>> has a listener connected. >>>> >>>> Doing this has the disadvantage that it is not possible to immediately >>>> kill and restart the OSD because the ports are not yet release in the >>>> netstat table.... But that is an overseeable issue, and that time can be >>>> shorted by setting a sysctl. >>>> >>>> So the question is: >>>> - how much rebinding is required..... >>> >>> I think it's just for tests. My recollection is that we did this just >>> because we can run out of ports since we can't reuse one until the tcp >>> finwait2 (or whatever) timeout expires. >>> >>>> - And why do we set SO_REUSEADDR if we are going to add the ports to >>>> avoid_ports. And thus a complete new port is required. >>> >>> I suspect it's safe to drop the option if the Linux vs FreeBSD semantics >>> are in fact different. >> >> When I exclude the SO_REUSEADDR my Jenkins goes back to normal. >> Will submit a PR. > > Please #ifdef it so it's only excluded for FreeBSD. Yup, in #12593 --WjW -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html