Re: BUG: After stop and start wrong port is advertised

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ouch! Yes, I see two port-related fixes in the GlusterFS 3.12.3 release notes[0][1][2]. I've attached a tarball of all yesterday's logs from /var/log/glusterd on one the affected nodes (called "wingu3"). I hope that's what you need.

[0] https://github.com/gluster/glusterfs/blob/release-3.12/doc/release-notes/3.12.3.md
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1507747
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1507748

Thanks,

On Mon, Jan 22, 2018 at 6:34 AM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
The patch was definitely there in 3.12.3. Do you have the glusterd and brick logs handy with you when this happened?

On Sun, Jan 21, 2018 at 10:21 PM, Alan Orth <alan.orth@xxxxxxxxx> wrote:
For what it's worth, I just updated some CentOS 7 servers from GlusterFS 3.12.1 to 3.12.4 and hit this bug. Did the patch make it into 3.12.4? I had to use Mike Hulsman's script to check the daemon port against the port in the volume's brick info, update the port, and restart glusterd on each node. Luckily I only have four servers! Hoping I don't have to do this every time I reboot!

Regards,

On Sat, Dec 2, 2017 at 5:23 PM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
On Sat, 2 Dec 2017 at 19:29, Jo Goossens <jo.goossens@xxxxxxxxxxxxxxxx> wrote:

Hello Atin,

 

 

Could you confirm this should have been fixed in 3.10.8? If so we'll test it for sure!


Fix should be part of 3.10.8 which is awaiting release announcement.



Regards

Jo

 


 

-----Original message-----
From: Atin Mukherjee <amukherj@xxxxxxxxxx>
Sent: Mon 30-10-2017 17:40
Subject: Re: [Gluster-users] BUG: After stop and start wrong port is advertised
To: Jo Goossens <jo.goossens@xxxxxxxxxxxxxxxx>;
CC: gluster-users@xxxxxxxxxxx;

On Sat, 28 Oct 2017 at 02:36, Jo Goossens <jo.goossens@xxxxxxxxxxxxxxxx> wrote:

Hello Atin,

 

 

I just read it and very happy you found the issue. We really hope this will be fixed in the next 3.10.7 version!

 
3.10.7 - no I guess as the patch is still in review and 3.10.7 is getting tagged today. You’ll get this fix in 3.10.8. 
 

 

 

 

PS: Wow nice all that c code and those "goto out" statements (not always considered clean but the best way often I think). Can remember the days I wrote kernel drivers myself in c :)

 

 

Regards

Jo Goossens

 

 


 

-----Original message-----
From: Atin Mukherjee <amukherj@xxxxxxxxxx>
Sent: Fri 27-10-2017 21:01
Subject: Re: BUG: After stop and start wrong port is advertised
To: Jo Goossens <jo.goossens@xxxxxxxxxxxxxxxx>;
CC: gluster-users@xxxxxxxxxxx;
We (finally) figured out the root cause, Jo!
 
Patch https://review.gluster.org/#/c/18579 posted upstream for review.

On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <jo.goossens@xxxxxxxxxxxxxxxx> wrote:

Hi,

 

 

We use glusterfs 3.10.5 on Debian 9.

 

When we stop or restart the service, e.g.: service glusterfs-server restart

 

We see that the wrong port get's advertised afterwards. For example:

 

Before restart:

 

Status of volume: public
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public        49153     0          Y       6364
Brick 192.168.140.42:/gluster/public        49152     0          Y       1483
Brick 192.168.140.43:/gluster/public        49152     0          Y       5913
Self-heal Daemon on localhost               N/A       N/A        Y       5932
Self-heal Daemon on 192.168.140.42          N/A       N/A        Y       13084
Self-heal Daemon on 192.168.140.41          N/A       N/A        Y       15499
 
Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
 
 
After restart of the service on one of the nodes (192.168.140.43) the port seems to have changed (but it didn't):
 
root@app3:/var/log/glusterfs#  gluster volume status
Status of volume: public
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public        49153     0          Y       6364
Brick 192.168.140.42:/gluster/public        49152     0          Y       1483
Brick 192.168.140.43:/gluster/public        49154     0          Y       5913
Self-heal Daemon on localhost               N/A       N/A        Y       4628
Self-heal Daemon on 192.168.140.42          N/A       N/A        Y       3077
Self-heal Daemon on 192.168.140.41          N/A       N/A        Y       28777
 
Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
 
 
However the active process is STILL the same pid AND still listening on the old port
 
root@192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      5913/glusterfsd
 
 
The other nodes logs fill up with errors because they can't reach the daemon anymore. They try to reach it on the "new" port instead of the old one:
 
[2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
[2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
[2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
[2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
[2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
 
So they now try 49154 instead of the old 49152 
 
Is this also by design? We had a lot of issues because of this recently. We don't understand why it starts advertising a completely wrong port after stop/start.
 
 
 
 

 

Regards

Jo Goossens

 


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
--
- Atin (atinm)
--
- Atin (atinm)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users


--



--

Alan Orth
alan.orth@xxxxxxxxx
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

Attachment: gluster-logs-node-wingu3-2017-01-22.tar.gz
Description: GNU Zip compressed data

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux