Re: regression failed : snapshot/bug-1316437.t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 25, 2016 at 04:34:17PM +0530, Avra Sengupta wrote:
> The crux of the problem is that as of today, brick processes on restart try
> to reuse the old port they were using (assuming that no other process will
> be using it, and not consulting pmap_registry_alloc() before using it). With
> a recent change, pmap_registry_alloc (), reassigns older ports that were
> used, but are now free. Hence snapd now gets a port that was previously used
> by a brick and tries to bind to it, whereas the older brick process without
> consulting pmap table blindly tries to connect to it, and hence we see this
> problem.
> 
> Now coming to the fix, I feel brick process should not try to get the older
> port and should just take a new port every time it comes up. We will not run
> out of ports with this change coz, now pmap allocates old ports again, and
> the previous port being used by the brick process will eventually be reused.
> If anyone sees any concern with this approach, please feel free to raise so
> now.

I wonder how this is handled with reconnecting clients. If a client
thinks it was connected to a brick, but the connection was lost, does it
try to connect to the same port again? I dont know if it really connects
to the pmap service in GlusterD to find the new/updated port...

Niels


> 
> While awaiting feedback from you guys, I have sent this patch
> (http://review.gluster.org/15001), which moves the said test case to bad
> tests for now, and after we collectively reach to a conclusion on the fix,
> we will remove this from bad test.
> 
> Regards,
> Avra
> 
> On 07/25/2016 02:33 PM, Avra Sengupta wrote:
> > The failure suggests that the port snapd is trying to bind to is already
> > in use. But snapd has been modified to use a new port everytime. I am
> > looking into this.
> > 
> > On 07/25/2016 02:23 PM, Nithya Balachandran wrote:
> > > More failures:
> > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/22452/console
> > > 
> > > I see these messages in the snapd.log:
> > > 
> > > [2016-07-22 05:31:52.482282] I
> > > [rpcsvc.c:2199:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service:
> > > Configured rpc.outstanding-rpc-limit with value 64
> > > [2016-07-22 05:31:52.482352] W [MSGID: 101002]
> > > [options.c:954:xl_opt_validate] 0-patchy-server: option
> > > 'listen-port' is deprecated, preferred is
> > > 'transport.socket.listen-port', continuing with correction
> > > [2016-07-22 05:31:52.482436] E [socket.c:771:__socket_server_bind]
> > > 0-tcp.patchy-server: binding to  failed: Address already in use
> > > [2016-07-22 05:31:52.482447] E [socket.c:774:__socket_server_bind]
> > > 0-tcp.patchy-server: Port is already in use
> > > [2016-07-22 05:31:52.482459] W
> > > [rpcsvc.c:1630:rpcsvc_create_listener] 0-rpc-service: listening on
> > > transport failed
> > > [2016-07-22 05:31:52.482469] W [MSGID: 115045] [server.c:1061:init]
> > > 0-patchy-server: creation of listener failed
> > > [2016-07-22 05:31:52.482481] E [MSGID: 101019]
> > > [xlator.c:433:xlator_init] 0-patchy-server: Initialization of volume
> > > 'patchy-server' failed, review your volfile again
> > > [2016-07-22 05:31:52.482491] E [MSGID: 101066]
> > > [graph.c:324:glusterfs_graph_init] 0-patchy-server: initializing
> > > translator failed
> > > [2016-07-22 05:31:52.482499] E [MSGID: 101176]
> > > [graph.c:670:glusterfs_graph_activate] 0-graph: init failed
> > > 
> > > On Mon, Jul 25, 2016 at 12:00 PM, Ashish Pandey <aspandey@xxxxxxxxxx
> > > <mailto:aspandey@xxxxxxxxxx>> wrote:
> > > 
> > >     Hi,
> > > 
> > >     Following test has failed 3 times in last two days -
> > > 
> > >     ./tests/bugs/snapshot/bug-1316437.t
> > >     https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull
> > >     https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull
> > >     https://build.gluster.org/job/rackspace-regression-2GB-triggered/22470/consoleFull
> > > 
> > >     Please take a look at it and check if it spurious failure or not.
> > > 
> > >     Ashish
> > > 
> > >     _______________________________________________
> > >     Gluster-devel mailing list
> > >     Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
> > >     http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > > 
> > 
> 

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux