I've noticed this as well on the official 3.8.4 gluster packages from Red Hat # gluster v status Status of volume: aevmstorage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ae-vmstore01-rbe:/bricks/brick1/brick 49153 0 Y 2659 Brick ae-vmstore02-rbe:/bricks/brick1/brick 49152 0 Y 2651 Brick ae-vmstore03-rbe:/bricks/brick1/brick 49153 0 Y 72876 NFS Server on localhost 2049 0 Y 3389 Self-heal Daemon on localhost N/A N/A Y 3398 NFS Server on ae-vmstore02-rbe 2049 0 Y 2675 Self-heal Daemon on ae-vmstore02-rbe N/A N/A Y 2848 NFS Server on ae-vmstore03-rbe 2049 0 Y 156988 Self-heal Daemon on ae-vmstore03-rbe N/A N/A Y 156996 Task Status of Volume aevmstorage ------------------------------------------------------------------------------ There are no active volume tasks Deigo On Fri, Sep 22, 2017 at 11:28 AM, Jo Goossens <jo.goossens@xxxxxxxxxxxxxxxx> wrote: > Hi Darrell, > > > > > > Thanks, for us it's really easy to reproduce atm. Each restart or stop/start > is causing the issue atm over here. > > > > Atin will look into it on monday fortunately :) > > > > Regards > > Jo > > > > > > > > > -----Original message----- > From: Darrell Budic <budic@xxxxxxxxxxxxxxxx> > Sent: Fri 22-09-2017 17:24 > Subject: Re: BUG: After stop and start wrong port is > advertised > To: Atin Mukherjee <amukherj@xxxxxxxxxx>; > CC: Jo Goossens <jo.goossens@xxxxxxxxxxxxxxxx>; gluster-users@xxxxxxxxxxx; > I encountered this once in the past, an additional symptom was peers were in > disconnected state on the peers that were NOT using the wrong ports. > Disconnected peers is how it detected it in the first place. > > It happened to me after rebooting, and I fixed it but wasn’t able to stop > and gather debugging info on the time. > > The problem seemed to be that the volume files in > /var/lib/glusterd/vols/<vol-name>//bricks/<server name>\:-v0-<vol > name>-brick0 were not updated to reflect a new port # after the restart (and > the port numbers had changed to adding and deleting volumes since last > start). I stopped glusterd, killed any remaining glusterfsd’s, hand edited > the files to reflect the new ports they thought they were running the bricks > on (from vol info I think, maybe log files) and restarted glusterd, then > everything was happy again. > > Hope it helps, sounds like it may be a bug to me too if others are seeing > it. > > -Darrell > > >> On Sep 22, 2017, at 8:10 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote: >> >> I've already replied to your earlier email. In case you've not seen it in >> your mailbox here it goes: >> >> This looks like a bug to me. For some reason glusterd's portmap is >> referring to a stale port (IMO) where as brick is still listening to the >> correct port. But ideally when glusterd service is restarted, all the >> portmap in-memory is rebuilt. I'd request for the following details from you >> to let us start analysing it: >> >> 1. glusterd statedump output from 192.168.140.43 . You can use kill >> -SIGUSR2 <pid of glusterd> to request for a statedump and the file will be >> available in /var/run/gluster >> 2. glusterd, brick logfile for 192.168.140.43:/gluster/public from >> 192.168.140.43 >> 3. cmd_history logfile from all the nodes. >> 4. Content of /var/lib/glusterd/vols/public/ >> >> >> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens >> <jo.goossens@xxxxxxxxxxxxxxxx> wrote: >> Hi, >> >> >> >> We use glusterfs 3.10.5 on Debian 9. >> >> >> When we stop or restart the service, e.g.: service glusterfs-server >> restart >> >> >> We see that the wrong port get's advertised afterwards. For example: >> >> >> Before restart: >> >> >> Status of volume: public >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 192.168.140.41:/gluster/public 49153 0 Y >> 6364 >> Brick 192.168.140.42:/gluster/public 49152 0 Y >> 1483 >> Brick 192.168.140.43:/gluster/public 49152 0 Y >> 5913 >> Self-heal Daemon on localhost N/A N/A Y >> 5932 >> Self-heal Daemon on 192.168.140.42 N/A N/A Y >> 13084 >> Self-heal Daemon on 192.168.140.41 N/A N/A Y >> 15499 >> >> Task Status of Volume public >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> After restart of the service on one of the nodes (192.168.140.43) the port >> seems to have changed (but it didn't): >> >> root@app3:/var/log/glusterfs# gluster volume status >> Status of volume: public >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 192.168.140.41:/gluster/public 49153 0 Y >> 6364 >> Brick 192.168.140.42:/gluster/public 49152 0 Y >> 1483 >> Brick 192.168.140.43:/gluster/public 49154 0 Y >> 5913 >> Self-heal Daemon on localhost N/A N/A Y >> 4628 >> Self-heal Daemon on 192.168.140.42 N/A N/A Y >> 3077 >> Self-heal Daemon on 192.168.140.41 N/A N/A Y >> 28777 >> >> Task Status of Volume public >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> However the active process is STILL the same pid AND still listening on >> the old port >> >> root@192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster >> tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN >> 5913/glusterfsd >> >> >> The other nodes logs fill up with errors because they can't reach the >> daemon anymore. They try to reach it on the "new" port instead of the old >> one: >> >> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >> 0-public-client-2: changing port to 49154 (from 0) >> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >> 0-public-client-2: changing port to 49154 (from 0) >> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >> 0-public-client-2: changing port to 49154 (from 0) >> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] >> 0-public-client-2: changing port to 49154 (from 0) >> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] >> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection >> refused); disconnecting socket >> >> So they now try 49154 instead of the old 49152 >> >> Is this also by design? We had a lot of issues because of this recently. >> We don't understand why it starts advertising a completely wrong port after >> stop/start. >> >> >> >> >> >> Regards >> >> Jo Goossens >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users