Re-initiating this discussion as we have made some changes to the approach of the previous solution: Earlier we were thinking of allocating the brick ports on every restart but that may break production systems specially when admins will be very strict about opening up the ports. Patch [1] was committed into the master but now we have reverted it with patch [2]. New solution: To avoid the ports conflict between brick and clients we propose three different patches now: a. Ensure that client port range doesn't clash with bricks port range and the same can be found in [3] and the commit message has a good explanation about what it does. b. Add a bullet proof in GlusterD to check whether the already allocated brick port is free or not (in a restart/reboot scenario). Additionally the current implementation of pmap_port_isfree () issues a bind () call to check it and then free the port. However the same is again prone to introduce a race where you could end up in a situation where brick fails to bind the port given the kernel hasn't freed it up by that time. We are looking to change this function in a way that we just issue a connect () call to determine whether the port is usable. Patch [4] tracks this change. c. If brick process still fails to bind the port and report back the failure, GlusterD will retry with a new fresh port. We are also thinking about making it a max 3 attempts. Patch is not yet ready. Along with the above, there is one more patch [5] which tries to utilize the ports in a better way. Your comments/suggestions are pretty much required and appreciated. ~Atin [1] http://review.gluster.org/13865 [2] http://review.gluster.org/13989 [3] http://review.gluster.org/#/c/13998 [4] http://review.gluster.org/#/c/13990 [5] http://review.gluster.org/#/c/10785 On 03/31/2016 07:24 PM, Atin Mukherjee wrote: > As of now GlusterD maintains its own portmap table and is responsible > for allocating ports for the services like brick process, snapd. The > flow with the portmap goes like this: > > When a volume start is triggered, GlusterD checks whether the brick has > been already assigned a port earlier, if so then the same port is been > passed to the brick process, otherwise GlusterD picks up a free port > from the portmap table. > > Now say if the node reboots, then GlusterD first starts the daemons > followed by the brick process. Now given brick process tries to bind to > the same persisted port there is no guarantee that the same port > wouldn't be consumed by some other application (be it from gluster > application or not) and this is exactly what we noticed in one of the BZ > [1]. We hit this very frequently when number of brick processes go high. > > I think bringing up a process binding to a persisted port is not a good > idea since its prone to fail considering processes (clients) contend for > the same ports. > > I've sent a patch [2] which follows the same approach what snapd > currently does for port. the ports will continue to get persisted but on > every brick restart a fresh port will be allocated for the brick. The > only reason of persisting the brick is that the same will be attempted > to be removed from the portmap in case the brick hasn't been shutdown > gracefully and a pmap_registry_remove () hasn't been invoked. > > We'd also need another patch [3] to get this work as currently we don't > mark the port as free in pmap_registry_remove. > > Please note that [2] doesn't fully eliminate the probability of other > process stealing the port allocated by GlusterD as there is still a > small time window where GlusterD allocates the port and brick process > binds to it. > > As a complete/long term solution we think that GlusterD has to give up > managing the port allocation and the same has to be done by brick/daemon > process and GlusterD will be doing a book keeping of those ports. > > Your comments/suggestion is more than welcome here :) > > ~Atin > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1322805 > [2] http://review.gluster.org/#/c/13865/ > [3] http://review.gluster.org/#/c/10785/ > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel