Re: Brick port allocation by GlusterD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Re-initiating this discussion as we have made some changes to the
approach of the previous solution:

Earlier we were thinking of allocating the brick ports on every restart
but that may break production systems specially when admins will be very
strict about opening up the ports. Patch [1] was committed into the
master but now we have reverted it with patch [2].

New solution:

To avoid the ports conflict between brick and clients we propose three
different patches now:

a. Ensure that client port range doesn't clash with bricks port range
and the same can be found in [3] and the commit message has a good
explanation about what it does.

b. Add a bullet proof in GlusterD to check whether the already allocated
brick port is free or not (in a restart/reboot scenario). Additionally
the current implementation of pmap_port_isfree () issues a bind () call
to check it and then free the port. However the same is again prone to
introduce a race where you could end up in a situation where brick fails
to bind the port given the kernel hasn't freed it up by that time. We
are looking to change this function in a way that we just issue a
connect () call to determine whether the port is usable. Patch [4]
tracks this change.

c. If brick process still fails to bind the port and report back the
failure, GlusterD will retry with a new fresh port. We are also thinking
about making it a max 3 attempts. Patch is not yet ready.

Along with the above, there is one more patch [5] which tries to utilize
the ports in a better way.

Your comments/suggestions are pretty much required and appreciated.

~Atin

[1] http://review.gluster.org/13865
[2] http://review.gluster.org/13989
[3] http://review.gluster.org/#/c/13998
[4] http://review.gluster.org/#/c/13990
[5] http://review.gluster.org/#/c/10785


On 03/31/2016 07:24 PM, Atin Mukherjee wrote:
> As of now GlusterD maintains its own portmap table and is responsible
> for allocating ports for the services like brick process, snapd. The
> flow with the portmap goes like this:
> 
> When a volume start is triggered, GlusterD checks whether the brick has
> been already assigned a port earlier, if so then the same port is been
> passed to the brick process, otherwise GlusterD picks up a free port
> from the portmap table.
> 
> Now say if the node reboots, then GlusterD first starts the daemons
> followed by the brick process. Now given brick process tries to bind to
> the same persisted port there is no guarantee that the same port
> wouldn't be consumed by some other application (be it from gluster
> application or not) and this is exactly what we noticed in one of the BZ
> [1]. We hit this very frequently when number of brick processes go high.
> 
> I think bringing up a process binding to a persisted port is not a good
> idea since its prone to fail considering processes (clients) contend for
> the same ports.
> 
> I've sent a patch [2] which follows the same approach what snapd
> currently does for port. the ports will continue to get persisted but on
> every brick restart a fresh port will be allocated for the brick. The
> only reason of persisting the brick is that the same will be attempted
> to be removed from the portmap in case the brick hasn't been shutdown
> gracefully and a pmap_registry_remove () hasn't been invoked.
> 
> We'd also need another patch [3] to get this work as currently we don't
> mark the port as free in pmap_registry_remove.
> 
> Please note that [2] doesn't fully eliminate the probability of other
> process stealing the port allocated by GlusterD as there is still a
> small time window where GlusterD allocates the port and brick process
> binds to it.
> 
> As a complete/long term solution we think that GlusterD has to give up
> managing the port allocation and the same has to be done by brick/daemon
> process and GlusterD will be doing a book keeping of those ports.
> 
> Your comments/suggestion is more than welcome here :)
> 
> ~Atin
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1322805
> [2] http://review.gluster.org/#/c/13865/
> [3] http://review.gluster.org/#/c/10785/
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux