This is caused because when bind-insecure is turned on (which is the default now), it may happen that brick is not able to bind to port assigned by Glusterd for example 49192-49195... It seems to occur because the rpc_clnt connections are binding to ports in the same range. so brick fails to bind to a port which is already used by someone else. This bug already exist before http://review.gluster.org/#/c/11039/ when use rdma, i.e. even previously rdma binds to port >= 1024 if it cannot find a free port < 1024, even when bind insecure was turned off (ref to commit '0e3fd04e'). Since we don't have tests related to rdma we did not discover this issue previously. http://review.gluster.org/#/c/11039/ discovers the bug we encountered, however now the bug can be fixed by http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers from 65535 in a descending order, as a result port clash is minimized, also it fixes issues in rdma too Thanks to Raghavendra Talur for help in discovering the real cause Regards, Prasanna Kalever ----- Original Message ----- From: "Raghavendra Talur" <raghavendra.talur@xxxxxxxxx> To: "Krishnan Parthasarathi" <kparthas@xxxxxxxxxx> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> Sent: Thursday, July 2, 2015 6:45:17 PM Subject: Re: spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur < raghavendra.talur@xxxxxxxxx > wrote: On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi < kparthas@xxxxxxxxxx > wrote: > > > > A port assigned by Glusterd for a brick is found to be in use already by > > the brick. Any changes in Glusterd recently which can cause this? > > > > Or is it a test infra problem? This issue is likely to be caused by http://review.gluster.org/11039 This patch changes the port allocation that happens for rpc_clnt based connections. Previously, ports allocated where < 1024. With this change, these connections, typically mount process, gluster-nfs server processes etc could end up using ports that bricks are being assigned to. IIUC, the intention of the patch was to make server processes lenient to inbound messages from ports > 1024. If we don't require to use ports > 1024 we could leave the port allocation for rpc_clnt connections as before. Alternately, we could reserve the range of ports starting from 49152 for bricks by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is specific to Linux. I'm not aware of how this could be done in NetBSD for instance though. It seems this is exactly whats happening. I have a question, I get the following data from netstat and grep tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket Here 31516 is the brick pid. Looking at the data, line 2 is very clear, it shows connection between brick and glusterfs client. unix socket on line 3 is also clear, it is the unix socket connection that glusterd and brick process use for communication. I am not able to understand line 1; which part of brick process established a tcp connection with glusterd using port 1023? Note: this data is from a build which does not have the above mentioned patch. The patch which exposed this bug is being reverted till the underlying bug is also fixed. You can monitor revert patches here master: http://review.gluster.org/11507 3.7 branch: http://review.gluster.org/11508 Please rebase your patches after the above patches are merged to ensure that you patches pass regression. -- Raghavendra Talur _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel