Re: gluster peer probe failing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Any time estimation on to when this fix would be released? - In next 3.10 update (rastar to confirm the date)
Any recommended workaround? - probably you need to wipe off the ip reserved local ports file.


On Tue, Jun 20, 2017 at 2:36 PM, Guy Cukierman <guyc@xxxxxxxxxxx> wrote:

Thanks Gaurav!

 

  1. Any time estimation on to when this fix would be released?
  2. Any recommended workaround?

 

Best,

Guy.

 

From: Gaurav Yadav [mailto:gyadav@xxxxxxxxxx]
Sent: Tuesday, June 20, 2017 9:46 AM
To: Guy Cukierman <guyc@xxxxxxxxxxx>
Cc: Atin Mukherjee <amukherj@xxxxxxxxxx>; gluster-users@xxxxxxxxxxx


Subject: Re: gluster peer probe failing

 

Hi,

I am able to recreate the issue and here is my RCA.

Maximum value i.e 32767 is being overflowed while doing manipulation on it and it was previously not taken care properly.
Hence glusterd was crashing with SIGSEGV.

Issue is being fixed with "https://bugzilla.redhat.com/show_bug.cgi?id=1454418" and being backported as well.

 

 

Thanks

Gaurav

 

 

On Tue, Jun 20, 2017 at 6:43 AM, Gaurav Yadav <gyadav@xxxxxxxxxx> wrote:

Hi,

I have tried on my host by setting corresponding ports, but I didn't see the issue on my machine locally.

However with the logs you have sent it is prety much clear issue is related to ports only.

I will trying to reproduce on some other machine. Will update you as s0on as possible.

 

 

Thanks

Gaurav

 

On Sun, Jun 18, 2017 at 12:37 PM, Guy Cukierman <guyc@xxxxxxxxxxx> wrote:

Hi,

Below please find the reserved ports and log, thanks.

 

sysctl net.ipv4.ip_local_reserved_ports:

net.ipv4.ip_local_reserved_ports = 30000-32767

 

 

glusterd.log:

[2017-06-18 07:04:17.853162] I [MSGID: 106487] [glusterd-handler.c:1242:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 192.168.1.17 24007

[2017-06-18 07:04:17.853237] D [MSGID: 0] [common-utils.c:3361:gf_is_local_addr] 0-management: 192.168.1.17

[2017-06-18 07:04:17.854093] D [logging.c:1952:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk

The message "D [MSGID: 0] [common-utils.c:3361:gf_is_local_addr] 0-management: 192.168.1.17 " repeated 2 times between [2017-06-18 07:04:17.853237] and [2017-06-18 07:04:17.853869]

[2017-06-18 07:04:17.854093] D [MSGID: 0] [common-utils.c:3377:gf_is_local_addr] 0-management: 192.168.1.17 is not local

[2017-06-18 07:04:17.854221] D [MSGID: 0] [glusterd-peer-utils.c:132:glusterd_peerinfo_find_by_hostname] 0-management: Unable to find friend: 192.168.1.17

[2017-06-18 07:04:17.854271] D [logging.c:1952:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk

[2017-06-18 07:04:17.854269] D [MSGID: 0] [glusterd-peer-utils.c:132:glusterd_peerinfo_find_by_hostname] 0-management: Unable to find friend: 192.168.1.17

[2017-06-18 07:04:17.854271] D [MSGID: 0] [glusterd-peer-utils.c:246:glusterd_peerinfo_find] 0-management: Unable to find hostname: 192.168.1.17

[2017-06-18 07:04:17.854306] I [MSGID: 106129] [glusterd-handler.c:3690:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 192.168.1.17 (24007)

[2017-06-18 07:04:17.854343] D [MSGID: 0] [glusterd-peer-utils.c:486:glusterd_peer_hostname_new] 0-glusterd: Returning 0

[2017-06-18 07:04:17.854367] D [MSGID: 0] [glusterd-utils.c:7060:glusterd_sm_tr_log_init] 0-glusterd: returning 0

[2017-06-18 07:04:17.854387] D [MSGID: 0] [glusterd-store.c:4092:glusterd_store_create_peer_dir] 0-glusterd: Returning with 0

[2017-06-18 07:04:17.854918] D [MSGID: 0] [store.c:420:gf_store_handle_new] 0-: Returning 0

[2017-06-18 07:04:17.855083] D [MSGID: 0] [store.c:374:gf_store_save_value] 0-management: returning: 0

[2017-06-18 07:04:17.855130] D [logging.c:1952:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk

The message "D [MSGID: 0] [store.c:374:gf_store_save_value] 0-management: returning: 0" repeated 2 times between [2017-06-18 07:04:17.855083] and [2017-06-18 07:04:17.855128]

[2017-06-18 07:04:17.855129] D [MSGID: 0] [glusterd-store.c:4221:glusterd_store_peer_write] 0-glusterd: Returning with 0

[2017-06-18 07:04:17.856294] D [MSGID: 0] [glusterd-store.c:4247:glusterd_store_perform_peer_store] 0-glusterd: Returning 0

[2017-06-18 07:04:17.856332] D [MSGID: 0] [glusterd-store.c:4268:glusterd_store_peerinfo] 0-glusterd: Returning with 0

[2017-06-18 07:04:17.856365] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout

[2017-06-18 07:04:17.856387] D [MSGID: 0] [glusterd-handler.c:3474:glusterd_transport_inet_options_build] 0-glusterd: Returning 0

[2017-06-18 07:04:17.856409] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600

[2017-06-18 07:04:17.856421] D [rpc-clnt.c:1071:rpc_clnt_connection_init] 0-management: setting ping-timeout to 30

[2017-06-18 07:04:17.856434] D [rpc-transport.c:279:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so

[2017-06-18 07:04:17.856580] D [socket.c:4082:socket_init] 0-management: Configued transport.tcp-user-timeout=-1

[2017-06-18 07:04:17.856594] D [socket.c:4165:socket_init] 0-management: SSL support on the I/O path is NOT enabled

[2017-06-18 07:04:17.856625] D [socket.c:4168:socket_init] 0-management: SSL support for glusterd is NOT enabled

[2017-06-18 07:04:17.856634] D [socket.c:4185:socket_init] 0-management: using system polling thread

[2017-06-18 07:04:17.856664] D [name.c:168:client_fill_address_family] 0-management: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 192.168.1.17)

[2017-06-18 07:04:17.861800] D [MSGID: 0] [common-utils.c:334:gf_resolve_ip6] 0-resolver: returning ip-192.168.1.17 (port-24007) for hostname: 192.168.1.17 and port: 24007

[2017-06-18 07:04:17.861830] D [socket.c:2982:socket_fix_ssl_opts] 0-management: disabling SSL for portmapper connection

[2017-06-18 07:04:17.861885] D [MSGID: 0] [common-utils.c:3106:gf_ports_reserved] 0-glusterfs: lower: 30000, higher: 32767

[2017-06-18 07:04:17.861920] D [logging.c:1764:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages

[2017-06-18 07:04:17.861933] D [logging.c:1767:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages

pending frames:

frame : type(0) op(0)

patchset: git://git.gluster.org/glusterfs.git

signal received: 11

time of crash:

2017-06-18 07:04:17

configuration details:

argp 1

backtrace 1

dlfcn 1

libpthread 1

llistxattr 1

setfsid 1

spinlock 1

epoll.h 1

xattr.h 1

st_atim.tv_nsec 1

package-string: glusterfs 3.10.3

/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7fbdf7c964d0]

/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7fbdf7c9fdd4]

/lib64/libc.so.6(+0x35250)[0x7fbdf637a250]

/lib64/libglusterfs.so.0(gf_ports_reserved+0x15c)[0x7fbdf7ca044c]

/lib64/libglusterfs.so.0(gf_process_reserved_ports+0xbe)[0x7fbdf7ca070e]

/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xd158)[0x7fbde9c24158]

/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(client_bind+0x93)[0x7fbde9c245a3]

/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xa875)[0x7fbde9c21875]

/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xc9)[0x7fbdf7a5ff89]

/lib64/libgfrpc.so.0(rpc_clnt_start+0x39)[0x7fbdf7a60049]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24218)[0x7fbdec7b5218]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24843)[0x7fbdec7b5843]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24ae0)[0x7fbdec7b5ae0]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27890)[0x7fbdec7b8890]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27e20)[0x7fbdec7b8e20]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x20f5e)[0x7fbdec7b1f5e]

/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7fbdf7ccd750]

/lib64/libc.so.6(+0x46cf0)[0x7fbdf638bcf0]

---------

 

From: Gaurav Yadav [mailto:gyadav@xxxxxxxxxx]
Sent: Friday, June 16, 2017 5:47 AM
To: Atin Mukherjee <amukherj@xxxxxxxxxx>
Cc: Guy Cukierman <guyc@xxxxxxxxxxx>; gluster-users@xxxxxxxxxxx


Subject: Re: gluster peer probe failing

 

 

Could you please send me the output of command "sysctl net.ipv4.ip_local_reserved_ports".

Apart from output of command please send the logs to look into the issue.

Thanks

Gaurav

 

 

On Thu, Jun 15, 2017 at 4:28 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:

+Gaurav, he is the author of the patch, can you please comment here?

 

On Thu, Jun 15, 2017 at 3:28 PM, Guy Cukierman <guyc@xxxxxxxxxxx> wrote:

Thanks, but my current settings are:

net.ipv4.ip_local_reserved_ports = 30000-32767

net.ipv4.ip_local_port_range = 32768    60999

meaning the reserved ports are already in the short int range, so maybe I misunderstood something? or is it a different issue?

 

From: Atin Mukherjee [mailto:amukherj@xxxxxxxxxx]
Sent: Thursday, June 15, 2017 10:56 AM
To: Guy Cukierman <guyc@xxxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Subject: Re: gluster peer probe failing

 

https://review.gluster.org/#/c/17494/ will it and the next update of 3.10 should have this fix.

If sysctl net.ipv4.ip_local_reserved_ports has any value > short int range then this would be a problem with the current version. 
Would you be able to reset the reserved ports temporarily to get this going?
 

 

On Wed, Jun 14, 2017 at 8:32 PM, Guy Cukierman <guyc@xxxxxxxxxxx> wrote:

Hi,

I have a gluster (version 3.10.2) server running on a 3 node (centos7) cluster.

Firewalld and SELinux are disabled, and I see I can telnet from each node to the other on port 24007.

 

When I try to create the first peering by running on node1 the command:

gluster peer probe <node2 ip address>

 

I get the error:

“Connection failed. Please check if gluster daemon is operational.”

 

And Glusterd.log shows:

 

[2017-06-14 14:46:09.927510] I [MSGID: 106487] [glusterd-handler.c:1242:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 192.168.1.17 24007

[2017-06-14 14:46:09.928560] I [MSGID: 106129] [glusterd-handler.c:3690:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 192.168.1.17 (24007)

[2017-06-14 14:46:09.930783] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout

[2017-06-14 14:46:09.930837] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600

pending frames:

frame : type(0) op(0)

patchset: git://git.gluster.org/glusterfs.git

signal received: 11

time of crash:

2017-06-14 14:46:09

configuration details:

argp 1

backtrace 1

dlfcn 1

libpthread 1

llistxattr 1

setfsid 1

spinlock 1

epoll.h 1

xattr.h 1

st_atim.tv_nsec 1

package-string: glusterfs 3.10.3

/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f69625da4d0]

/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f69625e3dd4]

/lib64/libc.so.6(+0x35250)[0x7f6960cbe250]

/lib64/libglusterfs.so.0(gf_ports_reserved+0x15c)[0x7f69625e444c]

/lib64/libglusterfs.so.0(gf_process_reserved_ports+0xbe)[0x7f69625e470e]

/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xd158)[0x7f6954568158]

/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(client_bind+0x93)[0x7f69545685a3]

/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xa875)[0x7f6954565875]

/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xc9)[0x7f69623a3f89]

/lib64/libgfrpc.so.0(rpc_clnt_start+0x39)[0x7f69623a4049]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24218)[0x7f69570f9218]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24843)[0x7f69570f9843]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24ae0)[0x7f69570f9ae0]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27890)[0x7f69570fc890]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27e20)[0x7f69570fce20]

/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x20f5e)[0x7f69570f5f5e]

/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7f6962611750]

/lib64/libc.so.6(+0x46cf0)[0x7f6960ccfcf0]

 

And a file is create under /var/lib/glusterd/peers/<node2 ip address> which contains:

uuid=00000000-0000-0000-0000-000000000000

state=0

hostname1=192.168.1.17

 

and the glusterd daemon exits and I cannot restart it until I delete this file from the peers folder.

 

Any idea what is wrong?

thanks!


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

 

 

 

 

 


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux